Commit graph

211861 commits

Author SHA1 Message Date
Francisco Jerez
760437c4c4 intel/brw/xe3+: Override P value of GRF register classes to increase thread parallelism.
This causes the graph coloring allocator to use the optimistic
coloring codepath for all nodes whose total Q value exceeds the
threshold of 96 GRFs, in order to do a better job at minimizing the
register requirement of programs even when they are trivially
colorable.  At the threshold of 96 GRFs the number of threads
available per EU starts decreasing as the number of register blocks
requested by the program increases, so decreasing the number of
registers can increase performance.

That showed up in some test cases as a performance inversion from the
enabling of VRT, since the extension of the register set to 256 GRFs
has the side effect of making some non-trivially colorable programs
trivially colorable, which would cause the register allocator to do a
worse job at ordering the (trivial) allocations due to the optimistic
coloring path being skipped, leading to increased register use and
reduced performance.

The following Traci test cases improve significantly as a result of
this change (4 iterations, 5% significance):

MetroExodus-trace-dx11-2160p-ultra:                 1.90% ±0.85%
BaldursGate3-trace-dx11-1440p-ultra:                1.47% ±0.38%
Palworld-trace-dx11-1080p-med:                      1.01% ±0.09%
TerminatorResistance-trace-dx11-2160p-ultra:        0.95% ±0.29%
Control-trace-dx11-1440p-high:                      0.87% ±0.50%

Even though lowering the P value threshold is expected to have a cost
in compile time theoretically due to the increased use of the slower
optimistic path of the graph coloring allocator, this doesn't actually
show up in my numbers, my shader-db and fossil-db compile-time numbers
don't show any statistically significant change (13 iterations, 5%
significance).

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36618>
2025-09-10 02:15:55 +00:00
Francisco Jerez
74168a601e util/ra: Allow driver to override class P value.
This is helpful for the driver to have the option to provide a custom
threshold for the PQ test performed by the graph coloring algorithm.

A threshold lower than the physical number of registers is helpful on
platforms where the number of registers used can impose a limit on the
thread parallelism of the program.  In such platforms even though a
passing PQ test guarantees that the node can be pushed onto the stack
and neglected while coloring the remaining nodes, the ordering in
which this happen can have a dramatic effect in the register pressure
of the resulting shader and therefore also on the thread parallelism
of the program.

Setting a P value threshold lower than the real P value will cause
nodes with Q value above the threshold to use the existing optimistic
coloring heuristic that takes the effort of ordering nodes in the
stack by Q value, in order to do a better job at minimizing the total
register requirement of the program.  Even though this causes us to
hit the optimistic codepaths for trivially colorable nodes the
interference graph is still guaranteed to be trivially colorable if it
was trivially colorable without the override.

The use of a threshold lower than the real P value will come at a
compile-time performance cost, the specific trade-off between
compile-time and run-time can be adjusted by the driver based on the
number of registers available to each thread without causing a hit to
thread parallelism.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36618>
2025-09-10 02:15:55 +00:00
Francisco Jerez
35ac517780 intel/brw/xe3+: Define BRW_SCHEDULE_PRE_LATENCY scheduling mode.
This defines a new pre-RA scheduling mode similar to BRW_SCHEDULE_PRE
but more aggressive at optimizing for minimum latency rather than
minimum register usage.  The main motivation is that on recent xe3
platforms we use a register allocation heuristic that packs variables
more tightly at the bottom of the register file instead of the
round-robin heuristic we used on previous platforms, since as a result
of VRT there is a parallelism penalty when a program uses more GRF
registers than necessary.  Unfortunately the xe3 tight-packing
heuristic severely constrains the work of the post-RA scheduler due to
the false dependencies introduced during register allocation, so we
can do a better job by making the scheduler aware of instruction
latencies before the register allocator introduces any false
dependencies.

This can lead to higher register pressure, but only when the scheduler
decides it could save cycles by extending a live range.  It makes
sense to preserve the preexisting BRW_SCHEDULE_PRE as a separate mode
since some workloads can still benefit from neglecting latencies
pre-RA due to the trade-off mentioned between parallelism and GRF use,
a future commit will introduce a more accurate estimate of the
expected relative performance of BRW_SCHEDULE_PRE
vs. BRW_SCHEDULE_PRE_LATENCY taking into account this trade-off.

In theory this could also be helpful on earlier pre-xe3 platforms, but
the benefit should be significantly smaller due to the different RA
heuristic so it hasn't been tested extensively pre-xe3.

The following Traci tests are improved significantly by this change on
PTL (nearly all tests that run on my system are affected positively):

Ghostrunner2-trace-dx11-1440p-ultra:                7.12% ±0.36%
SpaceEngineers-trace-dx11-2160p-high:               5.77% ±0.43%
HogwartsLegacy-trace-dx12-1080p-ultra:              4.40% ±0.03%
Naraka-trace-dx11-1440p-highest:                    3.06% ±0.43%
MetroExodus-trace-dx11-2160p-ultra:                 2.26% ±0.60%
Fortnite-trace-dx11-2160p-epix:                     2.12% ±0.53%
Nba2K23-trace-dx11-2160p-ultra:                     1.98% ±0.30%
Control-trace-dx11-1440p-high:                      1.93% ±0.36%
GodOfWar-trace-dx11-2160p-ultra:                    1.62% ±0.47%
TotalWarPharaoh-trace-dx11-1440p-ultra:             1.55% ±0.18%
MountAndBlade2-trace-dx11-1440p-veryhigh:           1.51% ±0.37%
Destiny2-trace-dx11-1440p-highest:                  1.44% ±0.34%
GtaV-trace-dx11-2160p-ultra:                        1.26% ±0.27%
ShadowTombRaider-trace-dx11-2160p-ultra:            1.10% ±0.58%
Borderlands3-trace-dx11-2160p-ultra:                0.95% ±0.43%
TerminatorResistance-trace-dx11-2160p-ultra:        0.87% ±0.22%
BaldursGate3-trace-dx11-1440p-ultra:                0.84% ±0.28%
CitiesSkylines2-trace-dx11-1440p-high:              0.82% ±0.22%
PubG-trace-dx11-1440p-ultra:                        0.72% ±0.37%
Palworld-trace-dx11-1080p-med:                      0.71% ±0.26%
Superposition-trace-dx11-2160p-extreme:             0.69% ±0.19%

The compile-time cost of shader-db increases significantly by 1.85%
after this commit (14 iterations, 5% significance), the compile-time
of fossil-db doesn't change significantly in my setup.

v2: Addressed interaction with 81594d0db1,
    since the code that calculates deps, delays and exits is no longer
    mode-independent after this change.  Instead of reverting that
    commit (which is non-trivial and would have a greater compile-time
    hit) simply reconstruct the scheduler object during the transition
    between BRW_SCHEDULE_PRE_LATENCY and any other PRE mode that
    doesn't require instruction latencies.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36618>
2025-09-10 02:15:55 +00:00
Francisco Jerez
501b1cbc2c intel/brw: Fix behavior of scheduler around flag register writes.
We were currently treating explicit flag writes and reads as a full
scheduler barrier, which is unnecessary since the tracking we already
do handles explicit flag access correctly so there is no reason for
taking a possibly large performance hit from add_barrier_deps().

Found by inspection while trying to understand the poor scheduling of
some fragment shaders.  Improves performance by a small but
statistically significant amount (4 iterations, 5% significance) for
the following Traci tests in combination with a subsequent commit that
makes the pre-RA scheduler sensitive to instruction latencies:

SpaceEngineers-trace-dx11-2160p-high:               0.66% ±0.30%
MountAndBlade2-trace-dx11-1440p-veryhigh:           0.62% ±0.23%

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36618>
2025-09-10 02:15:55 +00:00
Francisco Jerez
17b068ed1c intel/brw/xe3+: Handle SENDG in instruction scheduler.
We weren't handling the SHADER_OPCODE_SEND_GATHER instruction in the
instruction scheduler and this was leading to reduced performance in
many programs since SEND instructions have the longest latency and
tend to be among the most critical to schedule efficiently.  Handle
SENDG similarly to SEND since the timings of both instructions are
mostly bound by the shared function which doesn't care if the message
was sent by SEND or SENDG.

Improves performance significantly in the following Traci traces (4
iterations, 5% significance), most of them regressions from SENDG
being enabled:

MetroExodus-trace-dx11-2160p-ultra:                 1.99% ±0.88%
HogwartsLegacy-trace-dx12-1080p-ultra:              1.33% ±0.20%
GtaV-trace-dx11-2160p-ultra:                        1.12% ±0.19%
Borderlands3-trace-dx11-2160p-ultra:                1.00% ±0.58%
TerminatorResistance-trace-dx11-2160p-ultra:        0.98% ±0.27%
Control-trace-dx11-1440p-high:                      0.91% ±0.36%
Naraka-trace-dx11-1440p-highest:                    0.90% ±0.30%
Ghostrunner2-trace-dx11-1440p-ultra:                0.87% ±0.38%
Palworld-trace-dx11-1080p-med:                      0.71% ±0.17%

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36618>
2025-09-10 02:15:54 +00:00
Lionel Landwerlin
d6ee5b7177 anv: remove divergence requirement
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Not required since we've disabled maintenance8 support.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: d39e443ef8 ("anv: add infrastructure for common vk_pipeline")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37242>
2025-09-09 21:25:06 +00:00
Mike Blumenkrantz
dee9600ac7 zink: eliminate buffer refcounting to improve performance
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36296>
2025-09-09 20:47:38 +00:00
Mike Blumenkrantz
b3133e250e gallium: add pipe_context::resource_release to eliminate buffer refcounting
refcounting uses atomics, which are a significant source of CPU overhead
in many applications. by adding a method to inform the driver that
the frontend has released ownership of a buffer, all other refcounting
for the buffer can be eliminated

see MR for more details

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36296>
2025-09-09 20:47:38 +00:00
Mike Blumenkrantz
7c1c2f8fce zink: ensure transient surface is created when doing msaa expand
forgetting this can lead to res->transient being NULL

Fixes: ef3f798957 ("zink: prune zink_surface down to the imageview and create/fetch on demand")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37257>
2025-09-09 19:10:25 +00:00
Caio Oliveira
67fcfed67b brw: Add FILE * parameter to dump_assembly
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37259>
2025-09-09 10:40:42 -07:00
Anna Maniscalco
19f32df4ff mailmap: Update my name
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Egg-crAcked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37230>
2025-09-09 16:44:38 +00:00
Mike Blumenkrantz
8eca6ee134 zink: just reference compute progs to batch on delete
eliminates a bunch of pointless refcounting

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37139>
2025-09-09 16:14:16 +00:00
Mike Blumenkrantz
1e07f58c62 zink: do bindless init when binding a bindless shader, not on create
this avoids doing bindless init in multi-context scenarios where one
context is used solely for shader compilation

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37139>
2025-09-09 16:14:16 +00:00
Mike Blumenkrantz
e21438192a zink: set current compute prog after comparing against current compute prog
not sure how this was never caught before now?

cc: mesa-stable

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37139>
2025-09-09 16:14:16 +00:00
Mike Blumenkrantz
3fecf68784 zink: only set compute module info on dispatch (after compile fence)
this otherwise can attempt to access thread data

cc: mesa-stable

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37139>
2025-09-09 16:14:14 +00:00
Lionel Landwerlin
febe90e109 vulkan: remove incorrect assert
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
You can have a group with 0 shaders in it.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 69a04151db ("vulkan/runtime: add ray tracing pipeline support")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13858
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37249>
2025-09-09 13:34:05 +00:00
Rhys Perry
e2181744c2 aco/tests: add barrier-to-waitcnt tests
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>
2025-09-09 12:34:40 +00:00
Rhys Perry
0f32b573a4 aco/gfx10: skip waitcnts or use vm_vsrc(0) for workgroup lds barriers
fossil-db (navi21):
Totals from 36594 (45.84% of 79825) affected shaders:
Instrs: 19922581 -> 19922563 (-0.00%)
CodeSize: 103616980 -> 103616956 (-0.00%)
Latency: 69862064 -> 69053273 (-1.16%)
InvThroughput: 14607708 -> 14606308 (-0.01%); split: -0.01%, +0.00%

fossil-db (navi31):
Totals from 1641 (2.06% of 79825) affected shaders:
Instrs: 1247591 -> 1247875 (+0.02%); split: -0.00%, +0.03%
CodeSize: 6259516 -> 6260612 (+0.02%); split: -0.00%, +0.02%
Latency: 7657224 -> 7577299 (-1.04%); split: -1.05%, +0.00%
InvThroughput: 1150669 -> 1148171 (-0.22%); split: -0.22%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>
2025-09-09 12:34:40 +00:00
Rhys Perry
ac882985c0 aco/gfx10: skip waitcnts or use vm_vsrc(0) for workgroup vmem barriers
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>
2025-09-09 12:34:40 +00:00
Rhys Perry
145b178de2 aco: fix workgroup-scope barrier between vmem and lds
A barrier between two lds/vmem instructions needs to ensure that the
second starts after the first finishes, which means that we can't just
skip workgroup-scope vmem barriers if there is a lds instruction later.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>
2025-09-09 12:34:40 +00:00
Rhys Perry
02718fd4c5 aco: use a separate event for sendmsg_rtn
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>
2025-09-09 12:34:40 +00:00
Rhys Perry
5812c2ea89 aco: update waitcnt events for exports
Include primitive, dual source blend and POS4 exports.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>
2025-09-09 12:34:40 +00:00
Rhys Perry
711c023b55 aco: remove waitcnt code for POPS
We now insert barriers around these instead.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>
2025-09-09 12:34:40 +00:00
Rhys Perry
005694fe1f aco: remove waitcnt code for SMEM stores
These were removed in GFX10.3 and we haven't used them in a while.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>
2025-09-09 12:34:40 +00:00
Rhys Perry
20cd5cf5f7 aco: delay barrier waitcnt until they are needed
fossil-db (navi21):
Totals from 44 (0.06% of 79825) affected shaders:
Instrs: 16001 -> 15932 (-0.43%); split: -0.46%, +0.02%
CodeSize: 85800 -> 85548 (-0.29%); split: -0.30%, +0.01%
Latency: 190124 -> 173458 (-8.77%)
InvThroughput: 23605 -> 22756 (-3.60%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>
2025-09-09 12:34:40 +00:00
Rhys Perry
843acfa50b aco: add a separate barrier_info for release/acquire barriers
These can wait for different sets of accesses.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>
2025-09-09 12:34:40 +00:00
Rhys Perry
6c446c2f83 aco: refactor waitcnt pass to use barrier_info
Currently there's just barrier_info_all, but more will be added later.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>
2025-09-09 12:34:40 +00:00
Rhys Perry
21332609b9 aco: don't move acquire barriers before interlock begin
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>
2025-09-09 12:34:40 +00:00
Rhys Perry
0ee1c137f9 aco: don't move release barriers after interlock end
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>
2025-09-09 12:34:40 +00:00
Rhys Perry
7c056dd473 aco: add is_atomic_or_control_instr helper
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>
2025-09-09 12:34:40 +00:00
Rhys Perry
df6a3b7619 aco: reduce cost of using values defined in predecessors
For code like:
   if (cond) {
      val = load()
   }
   use(val)
The "use(val)" now has a similar cost to a use inside the IF.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36491>
2025-09-09 12:34:40 +00:00
Gert Wollny
b7ac5d8453 r600/sfn: Optimize pred(not X != 0) to pred(X == 0)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37205>
2025-09-09 12:11:54 +00:00
Gert Wollny
125ce0f909 r600/sfh: Handle 64 bit comparisons in predicate optimization
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37205>
2025-09-09 12:11:54 +00:00
Gert Wollny
abe9b61212 r600/sfn: relax restrictions when optimizing predicate evaluation with a register
If the comparison comes right before the predicate evaluation it still
can be contracted.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37205>
2025-09-09 12:11:53 +00:00
Gert Wollny
bbbb2be123 r600/sfn: emit 64 bit predicates like normal ALU ops
Also clean up the scheduler changes we did to deal with one-slot and
two slot predicate ops at the same time.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37205>
2025-09-09 12:11:53 +00:00
Gert Wollny
51d8ca2dff r600/sfn: optimize comparison results
* optimize not(compare(a,b)), nir_opt_algebraic does this only if the
  comparison result is used only once, but on a vector arch we still get
  an advantage when doing this, because it reduces dependencies.
* optimize b2f32(compare(a,b)), this is r600 specific

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37205>
2025-09-09 12:11:53 +00:00
Gert Wollny
82dffae611 r600/sfn: don't use dummy regs in alu ops when no dest register is needed
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37205>
2025-09-09 12:11:53 +00:00
Gert Wollny
4f1f5aa02d r600/sfn: Add handling of channels for dest-less ALU ops
This will be used to get rid of some dummy register handling.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37205>
2025-09-09 12:11:52 +00:00
Gert Wollny
90b2fbbab4 r600/sfn: Pass chan and dest_clamp to alu op if no dest register is given
v2: move common code

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37205>
2025-09-09 12:11:52 +00:00
Gert Wollny
4dd3951323 r600/sfn: fix op2_pred_sete_64 opcode
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37205>
2025-09-09 12:11:52 +00:00
Georg Lehmann
08b58c3fac nir/lower_subgroups: remove lower_fp64 option
This was incorrect (it also lowered int64 reductions/scans), and the only
user can just use the general callback to precisely only lower what it wants.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37164>
2025-09-09 11:09:22 +00:00
Georg Lehmann
687510495f nir: remove subgroup size related nir_shader_compiler_options members
This was added with the goal to eventually replace the per
pass subgroup/ballot size options, but that won't work because
some backends don't have a fixed subgroup size across the compilation
process.

It was also mostly added to hack around mesa state tracker behavior,
and we have a better solution there now.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37164>
2025-09-09 11:09:22 +00:00
Georg Lehmann
c7d5108373 mesa/st: make double subgroup lowering more precise
Really don't touch anything else.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37164>
2025-09-09 11:09:21 +00:00
Georg Lehmann
9bc14a0047 nir/lower_subgroup: optimize reduce/scans with unknown subgroup size
We skip iterations with ifs.
These can be optimized aways after the subgroup size is known.
Every driver should do that because applications depend on it anyway.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37164>
2025-09-09 11:09:21 +00:00
Rhys Perry
c59a85d406 nir/load_store_vectorize: remove offset check in try_vectorize_shared2
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
This doesn't seem useful anymore.

fossil-db (gfx1201):
Totals from 111 (0.14% of 79839) affected shaders:
Instrs: 152356 -> 151883 (-0.31%); split: -0.35%, +0.04%
CodeSize: 808484 -> 805584 (-0.36%); split: -0.39%, +0.04%
VGPRs: 7880 -> 7844 (-0.46%); split: -0.91%, +0.46%
Latency: 4121366 -> 4120648 (-0.02%); split: -0.04%, +0.02%
InvThroughput: 814622 -> 815362 (+0.09%); split: -0.02%, +0.11%
VClause: 3066 -> 3065 (-0.03%); split: -0.10%, +0.07%
SClause: 2594 -> 2593 (-0.04%)
Copies: 9412 -> 9447 (+0.37%); split: -0.47%, +0.84%
PreSGPRs: 4012 -> 4026 (+0.35%)
PreVGPRs: 4025 -> 4070 (+1.12%); split: -0.22%, +1.34%
VALU: 80457 -> 81039 (+0.72%); split: -0.08%, +0.80%
SALU: 16542 -> 16528 (-0.08%); split: -0.10%, +0.02%
VOPD: 39 -> 44 (+12.82%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36370>
2025-09-09 10:11:52 +00:00
Rhys Perry
0f364aded3 nir/opt_offsets: improve shared2 optimization
Combine additions too, instead of just constant offsets.

fossil-db (gfx1201):
Totals from 97 (0.12% of 79839) affected shaders:
Instrs: 145269 -> 144886 (-0.26%); split: -0.27%, +0.01%
CodeSize: 762184 -> 759556 (-0.34%); split: -0.36%, +0.01%
VGPRs: 5812 -> 5764 (-0.83%)
Latency: 4050681 -> 4050528 (-0.00%); split: -0.01%, +0.00%
InvThroughput: 617458 -> 617181 (-0.04%); split: -0.05%, +0.00%
Copies: 8719 -> 8672 (-0.54%); split: -0.70%, +0.16%
PreVGPRs: 3558 -> 3543 (-0.42%); split: -0.59%, +0.17%
VALU: 77793 -> 77462 (-0.43%); split: -0.44%, +0.01%
SALU: 17028 -> 17009 (-0.11%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36370>
2025-09-09 10:11:51 +00:00
Rhys Perry
c10e495182 nir/opt_offsets: fix progress determination with offsets that add to zero
If the offset is iadd(iadd(iadd(a, 1), b), -1), try_extract_const_addition
will create a dead iadd(a, b) and claim that it didn't modify the shader.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36370>
2025-09-09 10:11:50 +00:00
Rhys Perry
9aad852af8 nir/opt_offsets: report progress if NUW is set
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36370>
2025-09-09 10:11:50 +00:00
Tomeu Vizoso
5eab4f06d5 teflon/tests: Remove dependency on xtensor
Upstream has been moving headers around and breaking users.

Because we don't use it for much right now, drop the dependency
altogether by open coding some rand() helpers.

Issue: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13681
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37220>
2025-09-09 11:07:19 +02:00
Sviatoslav Peleshko
b148d47c3e anv: Always disable Color Blending for unused Render Targets
Some checks are pending
macOS-CI / macOS-CI (dri) (push) Waiting to run
macOS-CI / macOS-CI (xlib) (push) Waiting to run
Commit d2f7b6d5 changed the BLEND_STATE update process so that only
the used render targets will be updated. This mostly works fine, but
in cases when the Dual Source Blending was used previously, we still
must turn it off to avoid the undefined behavior that leads to hangs.

Fixes: d2f7b6d5 ("anv: implement VK_KHR_dynamic_rendering_local_read")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13675
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37246>
2025-09-09 07:38:50 +00:00