Only consider copies with the right file. Otherwise, the spill chooser
will blow up looking for the next use. Also, we were marking
destinations as spilled when I ment to mark them as being in W. This
was a right mess. God thing to_cssa() was also broken so it wasn't
inserting nearly as many parallel copies as it was supposed to. 🙃
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
In the presence of nested loops, there is no guarantee that we can
handle back-edges in a single pass. Instead, use a bitset as a worlist
and repair until we're out of missing sources.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
It was converting instructions to/from Op which throws away predicates.
This meant that any instruction immediately after an OUT.EMIT would
loose its predicate (if any). If an OUT.EMIT comes right before a BRA,
this results in the branch always getting taken.
While reworking things, I also totally reformatted the pass to make it
more readable IMO.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
because
* ptxas refuses inputs with .cluster before sm90
* on sm90 ptxas encodes .cluster as .gpu
* nvdisasm calls this encoding on sm75 .SM
so I don't think we have an actual .cluster on any released gpus
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
This adds 4 NIR intrinsics for attribute I/O which match the Turing
hardware instructions as well as a lowering pass to lower
load/store[_per_vertex]_input/output to thewe intrinsics. This greatly
simplifies nak_to_nir.rs. Also, this pass is able to handle a bunch of
cases that the current code in nak_from_nir.rs can't:
- Misaligned access (i.e., a vec3 load at 0x0f4)
- Write masks on store[_per_vertex]_output
- Indirect load/store where we need to use AL2P to get physical
addresses, including scalarizing those cases
It also handles the casses where we need to use ISBERD on vertex indices
in the same pass. When we switch to this, we'll rip out the dedicated
per_vertex lowering pass.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
We could lower this to load_per_vertex_output in NIR but then it
confuses all sorts of NIR passes which assume load_*output only
happens in control shaders. We could also add a magic NIR intrinsic
but it's probably easier to just special-case this one.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
This commits implement load/store_per_vertex* and load_output which are
required for tessellation shaders. Because things are getting a bit
complicated, it's easier to combine all the attribute load/store ops in
a single case in the match.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
NVIDIA hardware doesn't take a vertex index for per-vertex I/O.
Instead, it takes an offset into the primitive. This has to be fetched
using a combination of SR_INVOCATION_INFO and the ISBERD instruction.
To keep things simple and allow for maximum CSE, we do the lowering in
NIR and patch the load/store_per_vertex_input/output intrinsic.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
We also need to call nir_lower_indirect_derefs. Even though we're not
lowering anything away, we need to call it so that it will lower clip
distance arrays. The pass always lowers compact array derfs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>