The current scheduling algorithm favors parallelism a bit too
aggressively and sometimes generates shaders that fail register
allocation. This happens even if the threshold is set to zero to force
it to always use the CSR instruction choosing algorithm.
This patch adds an option to use an even more aggressive fallback that
just always picks the instruction with the shortest maximum delay in the
hope that that will generate the least register pressure. The intention
is to use this as a last resort after register allocation fails in order
to at least have a working shader.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5953>
Adds a callback function to nir_schedule_options to give the backend a
chance to add custom dependencies between certain intrinsics. The
callback can assign a class number to the intrinsic and then set a read
or write dependency on that class.
v2: Use a linked-list of schedule nodes for the dependency classes
instead of a fixed-sized array.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5953>
Instead of copying the individual members of nir_schedule_options into
the scoreboard, it now just keeps a pointer to the options. This avoids
the duplicated comments and makes it easier to add more options later.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5953>
nir_schedule already has a struct for options which makes it more than
just a function declaration. Later patches intend to add more structs to
complement these options. In order to make the code easier to manage,
this moves the nir_scheduler-related parts out of nir.h to their own
header.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5953>
The scheduler has code to handle hardware that shares the same memory
for inputs and outputs. Seeing as the specific stages that need this is
probably hardware-dependent, this patch makes it a configurable option
instead of hard-coding it to everything but fragment shaders.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5561>
nir_deps_state is a struct used as a closure for calculating the
dependencies. Previously it had two fields copied out of the scoreboard.
The closure is initialised in two seperate places. In order to make it
easier to access other members of the scoreboard in the callbacks in
later patches, the closure now just contains a pointer to the scoreboard
and the two bits of copied information are removed.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5561>
load_per_vertex_input should probably be handled in the same way as a
regular load_input. I think the nir_schedule pass was written before V3D
had geometry shader support, so that is probably why it hasn’t taken
this into account until now.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5561>
This is a more explicit name now that we don't want it to be doing any
memory barrier stuff for us.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
This is similar to a scheduler I've written for vc4 and i965, but this
time written at the NIR level so that hopefully it's reusable. A notable
new feature it has is Goodman/Hsu's heuristic of "once we've started
processing the uses of a value, prioritize processing the rest of their
uses", which should help avoid the heuristic otherwise making such
systematically bad choices around getting texture results consumed.
Results for v3d:
total instructions in shared programs: 6497588 -> 6518242 (0.32%)
total threads in shared programs: 154000 -> 152828 (-0.76%)
total uniforms in shared programs: 2119629 -> 2068681 (-2.40%)
total spills in shared programs: 4984 -> 472 (-90.53%)
total fills in shared programs: 6418 -> 1546 (-75.91%)
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> (v1)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (v2)
v2: Use the DAG datastructure, fold in the scheduling-for-parallelism
patch, include SSA defs in live values so we can switch to bottom-up
if we want.
v3: Squash in improvements from Alejandro Piñeiro for getting V3D to
successfully register allocate on GLES3.1 dEQP. Make sure that
discards don't move after store_output. Comment spelling fix.