nir_deps_state is a struct used as a closure for calculating the
dependencies. Previously it had two fields copied out of the scoreboard.
The closure is initialised in two seperate places. In order to make it
easier to access other members of the scoreboard in the callbacks in
later patches, the closure now just contains a pointer to the scoreboard
and the two bits of copied information are removed.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5561>
load_per_vertex_input should probably be handled in the same way as a
regular load_input. I think the nir_schedule pass was written before V3D
had geometry shader support, so that is probably why it hasn’t taken
this into account until now.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5561>
This is a more explicit name now that we don't want it to be doing any
memory barrier stuff for us.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
This is similar to a scheduler I've written for vc4 and i965, but this
time written at the NIR level so that hopefully it's reusable. A notable
new feature it has is Goodman/Hsu's heuristic of "once we've started
processing the uses of a value, prioritize processing the rest of their
uses", which should help avoid the heuristic otherwise making such
systematically bad choices around getting texture results consumed.
Results for v3d:
total instructions in shared programs: 6497588 -> 6518242 (0.32%)
total threads in shared programs: 154000 -> 152828 (-0.76%)
total uniforms in shared programs: 2119629 -> 2068681 (-2.40%)
total spills in shared programs: 4984 -> 472 (-90.53%)
total fills in shared programs: 6418 -> 1546 (-75.91%)
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> (v1)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (v2)
v2: Use the DAG datastructure, fold in the scheduling-for-parallelism
patch, include SSA defs in live values so we can switch to bottom-up
if we want.
v3: Squash in improvements from Alejandro Piñeiro for getting V3D to
successfully register allocate on GLES3.1 dEQP. Make sure that
discards don't move after store_output. Comment spelling fix.