on top of scheduler changes, compile-time of shaders/blender/1017.shader_test:
Difference at 95.0% confidence
-0.00173202 +/- 0.00116931
-0.791537% +/- 0.532384%
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41688>
Consider:
u0 = foo()
if (divergent) {
u0 = bar()
r0 = baz(u0)
} else {
r0 = quux(u0)
}
Logically, this is fine, there is no interference between bar() and u0. But
physically, both sides of the if execute so the bar() write to u0 overwrites the
variable the else reads. So this is a miscompile.
The solution is to model the extra edges in the physical control flow graph,
which lives next to the existing logical control flow graph. Liveness for UGPRs
now follows the physical CFG, while liveness for GPRs continues to follow the
logical CFG. That models the interference properly, while still allowing phis to
work as before (since phis writing UGPRs follow uniform bits of control flow
that are necessarily critical edge free for the same reason the logical CFG is).
Because our RA copies shuffled registers back at block ends (following
Colombet), there's no issue with live range splits here (unlike aco which
inserts phis for this case and then needs to worry about critical edges around
those phis).
There might still be an extremely-challenging-to-hit bug here with UGPR spilling
which I need to think more about. It might be fine as-is? Not convinced though.
But this is big enough and strictly less broken than what we have right now and
the full solution will build on this, so here we are.
Fixes artefating in SuperTuxKart and Celestia knows what else.
Totals:
Instrs: 2770938 -> 2771269 (+0.01%); split: -0.00%, +0.02%
CodeSize: 40133712 -> 40138480 (+0.01%); split: -0.01%, +0.02%
Totals from 158 (5.97% of 2647) affected shaders:
Instrs: 514523 -> 514854 (+0.06%); split: -0.02%, +0.09%
CodeSize: 7603040 -> 7607808 (+0.06%); split: -0.03%, +0.09%
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41215>
Jay is a new SSA-based compiler for Intel GPUs. This is an early
work-in-progress. It isn't ready to ship, but we'd like to move development in
tree rather than rebasing the world every week. Please don't bother testing yet
- we know the status and we're working on it!
Jay's design is similar to other modern NIR backends, particularly ACO, NAK and
AGX. It is fully SSA, deconstructing phis after RA. We use a Colombet register
allocator similar to NAK, allowing us to handle Intel's complex register
regioning restrictions in a straightforward way. Spilling logical registers is
straightforward with Braun-Hack.
Thanks to the SSA-based design, the entire backend is essentially linear time,
regardless of register pressure, addressing brw's excessive compile time when
especially spilling with brw.
In this current early draft, we support a limited subset of all three APIs on
Xe2. A lot works and a lot doesn't. The core compiler is there (spilling,
scoreboarding, SIMD32, etc should more or less work), but there are details to
fill in for both performance and correctness. We essentially pass conformance on
OpenGL ES 3.0 and OpenCL 3.0, and we're busy iterating on Vulkan.
Likewise, additional hardware support will come down the line. There's nothing
fundamentally Xe2-specific here. I just have a Lunarlake laptop on my desk, Ken
has a Battlemage card, and we had to pick _something_ as the first target.
Co-authored-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40835>