Mix of Coccinelle patch, manual fix ups, sed, etc. Probably best to review the diff
as-if hand written:
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38955>
Avoid using exec_node::remove() and the initial "main list of
instructions", and instead use the existing helpers like other
passes.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37146>
Couldn't find in the docs a reference for the types needing to match,
and simulator + MTL seem fine with mixing UD and UW, so not adding
a replacement for the removed assertions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38877>
Most stages call this as part of brw_nir_postprocess_opts() but mesh
lowers to URB intrinsics after that since it needs bit-sizes lowered.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
We keep this separate from the other lowering infrastructure because
there's no semantic IO involved here, just byte offsets. Also, it needs
to run after nir_lower_mem_access_bit_sizes, which means it needs to be
run from brw_postprocess_opts. But we can't do the mesh URB lowering
there because that doesn't have the MUE map.
It's not that much code as a separate pass, though.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
With all the infrastructure in place, this is largely a matter of
calling the lowering passes with the appropriate data from the MUE map.
MUE initialization is now done with semantic IO instead of raw offsets.
This drops another case of non-standard NIR IO usage (and no_validate).
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
(Split by Ken from a larger patch originally written by Lionel.)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
(Based on the original implementation by Lionel Landwerlin, but adapted
to my respun URB lowering framework.)
The mesh shader URB payload requires reading and writing fields at
arbitrary DWord offsets. For example, the Primitive Indices array
starts at DWord 1, and it can be a vec1[], vec2[], or vec3[] array,
leading to very unaligned and sometimes double-parked elements.
Still, most fields are still conveniently vec4-aligned.
To handle this, we add a new cb_data::vec4_access flag. If set, access
remains in vec4 units, with vec4 alignment. We use this for non-mesh
stages. When unset, offset is in 32-bit units, allowing unaligned
DWord access.
This is trivial to support on Xe2, where the LSC URB messages support
arbitrary byte-aligned addressing. On older platforms, we have to
convert this to vec4 aligned offsets plus a component offset (either
returning a subset of the channels loaded, or using component masking
to store a subset of a vec4/vec8).
Thankfully, since the OWord URB messages support accessing a vec8 at
a time, this means we can do any vec4 access in one message, even if
it's double-parked. We use mod-analysis to see if we can statically
determine the sub-vec4 component offset required (we often can). If
not, we use the ability to have dynamic writemasks to sort it out.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
I forgot to copy this over in the LSC case. This meant we were missing
reorderability which meant that we were missing out on CSE.
fossil-db results on Battlemage:
Instrs: 231471427 -> 231363032 (-0.05%)
Send messages: 12077759 -> 12019628 (-0.48%)
Cycle count: 34058451430.0 -> 34057005552.0 (-0.00%); split: -0.01%, +0.00%
Spill count: 520387 -> 520135 (-0.05%)
Fill count: 470812 -> 470722 (-0.02%)
Max live registers: 72111834 -> 71873886 (-0.33%)
Totals from 2898 (0.37% of 788851) affected shaders:
Instrs: 1223836 -> 1115441 (-8.86%)
Send messages: 148633 -> 90502 (-39.11%)
Cycle count: 17732554.0 -> 16286676.0 (-8.15%); split: -10.65%, +2.49%
Spill count: 252 -> 0 (-inf%)
Fill count: 90 -> 0 (-inf%)
Max live registers: 491684 -> 253736 (-48.39%)
Non SSA regs after NIR: 255397 -> 255402 (+0.00%)
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
This now lowers IO intrinsics to URB intrinsics in a single step,
rather than modifying IO intrinsics to have non-standard meanings
temporarily. We are able to drop one "no_validate" flag.
For example, remap_patch_urb_offsets had added (vertex * stride) to
(offset) for per-vertex IO intrinsics, but left them as per-vertex
intrinsics. Now we just have an urb_offset() function to calculate
that when doing the lowering.
This also provides a central location for calculating URB offsets,
which we should be able to extend for other uses (per-view lowering,
mesh per-primitive lowering) in future patches.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
This block of code will be re-used in a future patch, also it reduces a bit the
size and complexity of lower_sampler_logical_send().
No changes in behavior intended here.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38792>
This is required for upcoming resource barrier work to implement HDC
flush's.
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38707>
Gfx20+ doesn't do PIPELINE_SELECT, the assumption is that we can now
do any PIPE_CONTROL we want regardless of the pipeline mode.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38707>
We experimentally found that some fixed functions have apparently be
hooked up to the L3. So we can drop a some flushing.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38707>
Now that we track the stages, it's not required to add those bits
anymore.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38707>
Now that we have the stages accumulated, we can delay this at flushing
time.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38707>
For a bunch of workarounds and special cases we want PIPE_CONTROL not
RESOURCE_BARRIER. We want emit_apply_pipe_flushes() to be mostly for
application barriers.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38707>