Commit graph

1119 commits

Author SHA1 Message Date
Alyssa Rosenzweig
e2a0d64d52 asahi: Add pass to predicate layer ID reads
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:37:55 -04:00
Alyssa Rosenzweig
e518c92d26 asahi: Assume LAYER is flat-shaded
It can't be anything else, this makes sure the varyings are sorted properly.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:37:55 -04:00
Alyssa Rosenzweig
8a48af4f8f agx/lower_tilebuffer: Support spilled layered RTs
If we spill render targets with a layered framebuffer, our spilled targets are
assumed to be 2D Arrays (in general). We need to use arrayed image operations to
load/store from these. The layer is given by the layer as read in the fragemnt
shader. This handles the eMRT portion of layered rendering.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:37:55 -04:00
Alyssa Rosenzweig
041451b655 agx/tilebuffer: Support layered layouts
Just add a flag for it. We don't care about the actual # of layers when
calculating the layout, only the boolean fact of being layered or not. The
reason we need this at all is because the eMRT implementation needs to
account for layering and that is only keyed off the tilebuffer layout.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:37:55 -04:00
Alyssa Rosenzweig
b252630604 agx: Support packed layered rendering writes
With the new pass.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:32:12 -04:00
Alyssa Rosenzweig
4a954dff07 asahi,agx: Select layered rendering outputs
These 2 are together

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:32:12 -04:00
Alyssa Rosenzweig
88fd76d378 asahi: Add helper to get layer id in internal program
For background/EOT only.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:32:12 -04:00
Alyssa Rosenzweig
7d94f2ee49 agx: Add pass to lower layer ID writes
The hardware needs the layer ID and the viewport index packed together. That
consumes an entire varying slot, if we want those available in the frag shader
we need a separate slot. Add a pass to insert the extra packed write.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:32:12 -04:00
Alyssa Rosenzweig
175819eec6 agx: Handle layered block image stores
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:32:12 -04:00
Alyssa Rosenzweig
c3a208d6d9 agx: Pack block image store dim correctly
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:32:12 -04:00
Alyssa Rosenzweig
da0da5d6f8 agx/nir_lower_texture: Allow disabling layer clamping
For background program with layered.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:32:12 -04:00
Alyssa Rosenzweig
10b9c2fa36 nir: Support arrays in block_image_store_agx
For layered rendering, runs once per layer.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:32:12 -04:00
Alyssa Rosenzweig
d83d24e96a agx: Insert jmp_exec_none instructions
With the exception of the backwards branch for loops, all the control flow we
insert during instruction selection just predicates instructions rather than
actually jumping around. That means, for example, we execute both sides of the
if even for a uniform condition! That's inefficient. The solution is insert
jmp_exec_none instructions after control flow in order to skip unexecuted
regions, which is much faster than predicating them out. However, jmp_exec_none
is costly in itself, so we need to use a heuristic to determine when it's
actually beneficial.

This uses a very simple heuristic for this purpose. However, it is a massive
performance speed-up for Dolphin uber shaders: 39fps -> 67fps at 2x resolution.
Nearly a doubling of performance!

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:32:11 -04:00
Alyssa Rosenzweig
79c4d4213c agx: Add agx_prev_block helper
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:32:11 -04:00
Alyssa Rosenzweig
dd6106c8bd agx: Add jumps to block ends
jmp_exec_none variant that jumps to the last instruction of the target block,
rather than the beginning. This is convenient for skipping over elses, while
still executing the block-final pop_exec instruction. Similarly for skipping
over loop bodies while still executing the block-final pop_exec, after break
instructions.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:32:11 -04:00
Alyssa Rosenzweig
22ab505a3d agx: Augment if/else/while_cmp with a target
Add an optional pointer to a target block for these instructions. This does NOT
act like a logical branch, and does NOT get added to the logical control flow.
It is ignored wholesale until after RA, when physical edges may be inserted by a
pass we add later in this series.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:32:11 -04:00
Alyssa Rosenzweig
d05657e0d6 agx: Hoist sample_mask/zs_emit
Although this is well-motivated, perf effect seems to be neglible for Dolphin.
It does prevent the scheduler from making things worse by sinking these
instructions though, so as a way to prevent future problems this seems sensible.

The kind of problem this affects (late discard) isn't modelled in shader-db.
Nevertheless, nothing concerning there:

   total instructions in shared programs: 1756699 -> 1756722 (<.01%)
   instructions in affected programs: 10106 -> 10129 (0.23%)
   helped: 21
   HURT: 41
   Inconclusive result (value mean confidence interval includes 0).

   total bytes in shared programs: 11525404 -> 11525452 (<.01%)
   bytes in affected programs: 72900 -> 72948 (0.07%)
   helped: 27
   HURT: 41
   Inconclusive result (value mean confidence interval includes 0).

   total halfregs in shared programs: 483394 -> 483286 (-0.02%)
   halfregs in affected programs: 4945 -> 4837 (-2.18%)
   helped: 88
   HURT: 78
   Inconclusive result (value mean confidence interval includes 0).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:32:11 -04:00
Alyssa Rosenzweig
0d8362b842 agx: Align the reg file for 256-bit vectors
This fixes live range splitting with 3D textureGrad(), which involves vectors
larger than the natural 128-bit maximum and hence requires special handling.
Fixes this assert with a combination of debug flags and new patches:

   unsigned int find_best_region_to_evict(struct ra_ctx *, unsigned int,
   unsigned int *, unsigned int *):
   Assertion `(rctx->bound % size) == 0 && "register file size must be aligned
   to the maximum vector size"' failed

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-01 12:32:11 -04:00
Alyssa Rosenzweig
d1eb17e92e treewide: Drop nir_ssa_for_src users
Via Coccinelle patch:

    @@
    expression b, s, n;
    @@

    -nir_ssa_for_src(b, *s, n)
    +s->ssa

    @@
    expression b, s, n;
    @@

    -nir_ssa_for_src(b, s, n)
    +s.ssa

Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25247>
2023-09-18 10:25:17 -04:00
Alyssa Rosenzweig
0df0980fc4 agx: Enable sinking ALU
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24833>
2023-09-18 08:38:16 -04:00
Alyssa Rosenzweig
fb60626260 agx: Run opt_idiv_const after lowering texture
Shaves 10 instructions off the cube map array lowering.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
2023-09-05 18:50:34 +00:00
Alyssa Rosenzweig
49951ef3cc agx: Lower coordinates for cube map array images
Annoyingly different from texture coordinates.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
2023-09-05 18:50:34 +00:00
Alyssa Rosenzweig
fb76f6cc6e agx: Handle cube arrays when clamping arrays
Need to adjust the component.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
2023-09-05 18:50:34 +00:00
Alyssa Rosenzweig
54ebddaa0f ail: Force page-alignment for layered attachments
When rendering to a layered depth/stencil attachment, we specify the layer
stride in pages. That means that depth/stencil targets must be page-aligned to
be rendered to correctly.

If we're merely sampling, not rendering, we do not need the extra alignment. So
we add a flag to handle this case so we keep passing the generated ail tests.

Fixes KHR-GLES31.core.texture_cube_map_array.color_depth_attachments

Similarly, we page-align colour attachments. I don't have a good theoretical
justification for this part, but it seems to be necessary and layered rendering
fails otherwise. Possibly the PBE requires page-aligned layers unconditionally?

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
2023-09-05 18:50:34 +00:00
Alyssa Rosenzweig
7895d5b79c agx: Add unit test for cmp+sel fusing
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
2023-09-05 18:50:34 +00:00
Alyssa Rosenzweig
bdad7992bc agx: Add unit test for if_cmp fusing
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
2023-09-05 18:50:34 +00:00
Alyssa Rosenzweig
08e0c5a9cf agx: Fuse compares into selects
This lets us save a LOT of instructions at the cost of increased register
pressure. However, on my shader-db, this is still coming out ahead since no
shaders are hurt for thread count/spills, and only 1/10 of the shaders helped
for instruction count are hurt for register pressure. The shaders most hurt
for pressure have very low pressure (7 -> 15 is the worst case) and you need a
certain number of registers to use a 4 source instruction at all. Analyzing the
hurt shaders, nothing concerns me too much ... this isn't as bad as I feared.

So I think at this point it's worth ripping off the bandage, given the massive
potential for instruction count win. This is a big improvement for some of the
shaders I'm working on for my $SECRET_PROJECT.

   total instructions in shared programs: 1784943 -> 1775169 (-0.55%)
   instructions in affected programs: 644211 -> 634437 (-1.52%)
   helped: 3498
   HURT: 38
   Instructions are helped.

   total bytes in shared programs: 11720734 -> 11643224 (-0.66%)
   bytes in affected programs: 4370986 -> 4293476 (-1.77%)
   helped: 3572
   HURT: 36
   Bytes are helped.

   total halfregs in shared programs: 474094 -> 475165 (0.23%)
   halfregs in affected programs: 12821 -> 13892 (8.35%)
   helped: 65
   HURT: 247
   Halfregs are HURT.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
2023-09-05 18:50:34 +00:00
Alyssa Rosenzweig
e7ffc799d1 agx: Fuse conditions into if's
Simple greedy thing that has the potential to inflate register pressure but
reduces instructions. Thanks to the recent loop work that turns if { break }
into while_icmp, this also implicitly handles fusing conditions into loops,
which is what actually prompted this.

Surprisingly, this helps register pressure on my shader-db (no change to thread
count), I guess by eliminating the boolean temps in case where the sources are
used multiple times.

   total instructions in shared programs: 1786561 -> 1784943 (-0.09%)
   instructions in affected programs: 128557 -> 126939 (-1.26%)
   helped: 474
   HURT: 13
   Instructions are helped.

   total bytes in shared programs: 11733236 -> 11720734 (-0.11%)
   bytes in affected programs: 976034 -> 963532 (-1.28%)
   helped: 521
   HURT: 13
   Bytes are helped.

   total halfregs in shared programs: 474245 -> 474094 (-0.03%)
   halfregs in affected programs: 1869 -> 1718 (-8.08%)
   helped: 28
   HURT: 7
   Halfregs are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
2023-09-05 18:50:34 +00:00
Alyssa Rosenzweig
f17ad0c516 agx: Generate unfused comparison pseudo ops
So we can optimize them easier.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
2023-09-05 18:50:34 +00:00
Alyssa Rosenzweig
ed6e391349 agx: Add pseudo-instructions for icmp/fcmp
Easier to optimize with.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
2023-09-05 18:50:34 +00:00
Alyssa Rosenzweig
139e56c0db agx: Only use nest by 1 for loops w/o continue
Apple doesn't do this, but it should be equivalent and it makes it easier to see
that we can use while_icmp for break_if_icmp in loops that don't use continue
(which Apple does do). So, the effect of this commit is to use while_icmp for
most breaks, which saves an instruction.

   total instructions in shared programs: 1764199 -> 1764076 (<.01%)
   instructions in affected programs: 24149 -> 24026 (-0.51%)
   helped: 78
   HURT: 0
   Instructions are helped.

   total bytes in shared programs: 11609306 -> 11608322 (<.01%)
   bytes in affected programs: 164604 -> 163620 (-0.60%)
   helped: 78
   HURT: 0
   Bytes are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
2023-09-05 18:50:34 +00:00
Alyssa Rosenzweig
8f06252e9b agx: Add helper to determine if a NIR loop uses continue
We need to emit extra instructions to handle continues, but if we don't have
any, we can omit those.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
2023-09-05 18:50:34 +00:00
Alyssa Rosenzweig
5c9495cf37 agx: Omit while_icmp without continue
The only role of the while_icmp at the end of a NIR loop is to make continue
jumps work. If, after emitting the loop, we learn that there are no continues,
there is no need to insert a while_icmp since it would be a no-op anyway.

   total instructions in shared programs: 1764311 -> 1764199 (<.01%)
   instructions in affected programs: 26321 -> 26209 (-0.43%)
   helped: 82
   HURT: 0
   Instructions are helped.

   total bytes in shared programs: 11609978 -> 11609306 (<.01%)
   bytes in affected programs: 178842 -> 178170 (-0.38%)
   helped: 82
   HURT: 0
   Bytes are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
2023-09-05 18:50:34 +00:00
Alyssa Rosenzweig
e71a1469a8 agx: Omit push_exec at top level
In general, loops need a push_exec at the start for correctness. However, a
push_exec at the top level (non-nested) is a no-op, so we can omit and save a
few cycles.

   total instructions in shared programs: 1764350 -> 1764311 (<.01%)
   instructions in affected programs: 7339 -> 7300 (-0.53%)
   helped: 36
   HURT: 0
   Instructions are helped.

   total bytes in shared programs: 11610212 -> 11609978 (<.01%)
   bytes in affected programs: 48638 -> 48404 (-0.48%)
   helped: 36
   HURT: 0
   Bytes are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
2023-09-05 18:50:34 +00:00
Alyssa Rosenzweig
6e0ae2c316 agx: Detect conditional breaks
Search for code like

   if ... {
      break
   }

and replace with a break_if pseudo-instruction for optimized handling, since the
break_if lowering is better than the original code.

   total instructions in shared programs: 1764596 -> 1764350 (-0.01%)
   instructions in affected programs: 24540 -> 24294 (-1.00%)
   helped: 78
   HURT: 0
   Instructions are helped.

   total bytes in shared programs: 11611196 -> 11610212 (<.01%)
   bytes in affected programs: 166458 -> 165474 (-0.59%)
   helped: 78
   HURT: 0
   Bytes are helped.

shader-db probably understates the benefit here, since this optimizes the body
of loops.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
2023-09-05 18:50:34 +00:00
Alyssa Rosenzweig
a009f39fca agx: Use agx_first_instr
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
2023-09-05 18:50:34 +00:00
Alyssa Rosenzweig
aad7d5288a agx: Add agx_first/last_instr helpers
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
2023-09-05 18:50:34 +00:00
Alyssa Rosenzweig
ffb64283ee agx: Add break_if_*cmp instructions
To faciliate break optimizations. We use a more efficient lowering than the
literal transition of the NIR.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
2023-09-05 18:50:34 +00:00
Alyssa Rosenzweig
ff816f224b agx: Split nest instruction into begin_cf + break
We use it for two different things. Pseudo-instructions are cheap, split it up
for easier optimization passes. This also fixes the schedule classes.. we can
move the cf_begin around if we want, it's inert.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
2023-09-05 18:50:34 +00:00
Alyssa Rosenzweig
b89c048c9b agx: Lower nest later
As part of pseudo op lowering. Simpler and will simplify control flow opts.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
2023-09-05 18:50:34 +00:00
Alyssa Rosenzweig
b25b36a9e3 agx: Expand nest
For breaking out of deeper control flow.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
2023-09-05 18:50:34 +00:00
Alyssa Rosenzweig
8405444143 agx: Lower pseudo-ops later
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
2023-09-05 18:50:34 +00:00
Alyssa Rosenzweig
f9343fe5ca agx: Remove logical_end instructions
They're more trouble than they're worth for us. They were originally lifted
unthinkingly from ACO, where I assume they're necessary for software CF
lowering, but they're just an inconvenient convenience for us. Remove em.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
2023-09-05 18:50:34 +00:00
Alyssa Rosenzweig
a2e5d1ddd1 asahi: Force translucency for ignored render targets
If we bound 4 render targets but we only write to 1 of them, the other 3 need
their contents preserved. This requires either properly configuring HSR to
implement colour masking (TODO) or using the big hammer of setting TRANSLUCENT.
This patch picks the latter for now.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
2023-09-05 18:50:34 +00:00
Alyssa Rosenzweig
62a2bdde7f agx: Lower pack_32_4x8_split
Fixes test_integer_ops integer_dot_product.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
2023-09-05 18:50:34 +00:00
Asahi Lina
c83672a0b3 asahi: Fix VDM pipeline field width
The lower bits have a special meaning, like on the other pipelines.

Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
2023-09-05 18:50:34 +00:00
Asahi Lina
0424017e72 asahi: decode: Do not assert on buffer overruns
This kills the hypervisor, let's just print and return.

Also flush after decoding, so that if something else goes wrong at least
we get the logs up to that point.

Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
2023-09-05 18:50:34 +00:00
Asahi Lina
acd5ed0451 asahi: decode: Implement VDM call/ret
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
2023-09-05 18:50:34 +00:00
Asahi Lina
c43dbadaa0 asahi: cmdbuf: Identify call/ret bits
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
2023-09-05 18:50:34 +00:00
Alyssa Rosenzweig
119e5b9719 agx: Schedule for register pressure
Since we register allocate in SSA, the number of registers required (register
demand) equals to the maximum number of simultaneous live values (register
pressure). So if we can reduce register pressure, we are guaranteed to reduce
register demand. Even an ineffective heuristic like randomly swapping
instructions can only reduce pressure as long as it's conservative.

This implements one such heuristic: in each block, schedule backwards, selecting
the free instruction that looks like it will reduce liveness the most. In other
words, the greedy algorithm to reduce register pressure. At the end of the
block, if we haven't actually reduced pressure, we bail. This isn't optimal, but
it's well-motivated and optimally handles special cases (like 0-source
instructions).

This is based on the scheduler I originally wrote for Mali.

In my Dolphin ubershader branch, this improved performance at native 4K by 10fps
(105fps->115fps) when I measured together with some other optimizations. On top
of my current next (which notably includes nir_opt_sink improvements), this
commit alone goes (53fps->54fps) which is considerably less impressive :-p

shader-db results are a win, but not as large as we might hope. Instruction
count win seems to be from the smaller live ranges being easier on RA (fewer
swaps / moves). The two shaders affected for thread count are from fifa mobile,
which go from 640 threads ->
1024 (full occupancy). In other words... this heuristic does an excellent job in
a small subset of shaders. The Dolphin ubershader win was real, though :~)

Note these shader-db wins are on top of a branch with the nir_opt_sink
improvements. Without that, the stats are much better... The schedulers have
some overlap, but they're better together.

   total instructions in shared programs: 1766635 -> 1763496 (-0.18%)
   instructions in affected programs: 445855 -> 442716 (-0.70%)
   helped: 1963
   HURT: 350
   Instructions are helped.

   total bytes in shared programs: 11597648 -> 11586924 (-0.09%)
   bytes in affected programs: 3106230 -> 3095506 (-0.35%)
   helped: 2003
   HURT: 374
   Bytes are helped.

   total halfregs in shared programs: 504609 -> 481980 (-4.48%)
   halfregs in affected programs: 138322 -> 115693 (-16.36%)
   helped: 3405
   HURT: 311
   Halfregs are helped.

   total threads in shared programs: 18839936 -> 18840704 (<.01%)
   threads in affected programs: 1280 -> 2048 (60.00%)
   helped: 2
   HURT: 0

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>
2023-09-05 18:50:34 +00:00