From a8fafc0f32023c076be9720d6a231ccfb1415a0f Mon Sep 17 00:00:00 2001 From: Kenneth Graunke Date: Sat, 9 Sep 2023 03:25:35 -0700 Subject: [PATCH] intel/elk: Re-run register allocation once after recreating the graph Our backend does a somewhat unusual sequence: 1. Set up the interference graph 2. Try to register allocate 3. Fail and realize we have to spill 4. Recreate(!) the interference graph with different node counts, because unfortunately spills and fills may need temporary registers set aside for that purpose, which can no longer be used generally. 5. Ask for the best spill node because we know we must spill On step 4, ra_realloc_interference_graph() reallocs the in_stack bitset for the new nodes. However, it leaves the new bitset words uninitialized, because it's supposed to be set up by ra_select(). On step 5, however, the Intel backend calls ra_get_best_spill_node() _without_ first calling ra_select() (or ra_allocate()). So at that point, the in_stack bitset is not properly initialized, and we'll end up reading uninitialized garbage in ra_get_best_spill_node(), and non-deterministically end up skipping candidates for spilling. While debugging this, I observed ra_get_best_spill_node() seeing non-zero in_stack bits set while g->tmp.stack_count was 0. So no nodes could possibly be in the stack. We could simply initialize the memory, but there's a deeper problem: in Chaitin-Briggs allocators, the list of spill candidates is built in the "Select" step. In our implementation, we technically don't make a list of candidates, but rather flag registers that *aren't* candidates. By never running ra_allocate() on our new graph, we never produce that info. So when we ask for a spill node, we consider *all* registers as spill candidates, which is far from ideal. To fix this, we simply call ra_allocate() to rebuild that information on the new graph. It's worth noting that it may not be quite the same as the information we had for our old graph, too, as we reserved some registers, increasing interference. This escaped our notice for a long time because our allocation loop tries to spill a single register, tries to allocate, and repeats if it fails. Because retrying calls ra_select(), which initializes the spill candidate info, this non-determinism only happened for the first register selected. However, recently the backend gained support for spilling multiple registers in each loop step, which highlighted this problem, as different per-step-spill-sizes produced different results due to this non-determinism. Cc: mesa-stable Fixes: e99081e76d4 ("intel/fs/ra: Spill without destroying the interference graph") --- src/intel/compiler/elk/elk_fs_reg_allocate.cpp | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/src/intel/compiler/elk/elk_fs_reg_allocate.cpp b/src/intel/compiler/elk/elk_fs_reg_allocate.cpp index ca31dbc43f8..17bfdaf4792 100644 --- a/src/intel/compiler/elk/elk_fs_reg_allocate.cpp +++ b/src/intel/compiler/elk/elk_fs_reg_allocate.cpp @@ -935,6 +935,19 @@ elk_fs_reg_alloc::choose_spill_reg() if (!interference_graph_supports_spilling) { discard_interference_graph(); build_interference_graph(true); + + /* ra_get_best_spill_node() relies on ra_allocate() having been called + * once to set up the stack of trivially colorable and optimistically + * colored nodes. By torching and rebuilding our interference graph, + * we also discarded the information needed to pick spill candidates. + * + * The simplest (if expensive) solution is to call ra_allocate() again + * on the new graph. This can't succeed - allocation already failed on + * our old graph which had fewer constraints - but it creates the list + * of spill candidates for our new more constrained graph. + */ + ASSERTED bool allocated = ra_allocate(g); + assert(!allocated); } if (!have_spill_costs)