Couldn't find in the docs a reference for the types needing to match,
and simulator + MTL seem fine with mixing UD and UW, so not adding
a replacement for the removed assertions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38877>
Most stages call this as part of brw_nir_postprocess_opts() but mesh
lowers to URB intrinsics after that since it needs bit-sizes lowered.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
We keep this separate from the other lowering infrastructure because
there's no semantic IO involved here, just byte offsets. Also, it needs
to run after nir_lower_mem_access_bit_sizes, which means it needs to be
run from brw_postprocess_opts. But we can't do the mesh URB lowering
there because that doesn't have the MUE map.
It's not that much code as a separate pass, though.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
With all the infrastructure in place, this is largely a matter of
calling the lowering passes with the appropriate data from the MUE map.
MUE initialization is now done with semantic IO instead of raw offsets.
This drops another case of non-standard NIR IO usage (and no_validate).
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
(Split by Ken from a larger patch originally written by Lionel.)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
(Based on the original implementation by Lionel Landwerlin, but adapted
to my respun URB lowering framework.)
The mesh shader URB payload requires reading and writing fields at
arbitrary DWord offsets. For example, the Primitive Indices array
starts at DWord 1, and it can be a vec1[], vec2[], or vec3[] array,
leading to very unaligned and sometimes double-parked elements.
Still, most fields are still conveniently vec4-aligned.
To handle this, we add a new cb_data::vec4_access flag. If set, access
remains in vec4 units, with vec4 alignment. We use this for non-mesh
stages. When unset, offset is in 32-bit units, allowing unaligned
DWord access.
This is trivial to support on Xe2, where the LSC URB messages support
arbitrary byte-aligned addressing. On older platforms, we have to
convert this to vec4 aligned offsets plus a component offset (either
returning a subset of the channels loaded, or using component masking
to store a subset of a vec4/vec8).
Thankfully, since the OWord URB messages support accessing a vec8 at
a time, this means we can do any vec4 access in one message, even if
it's double-parked. We use mod-analysis to see if we can statically
determine the sub-vec4 component offset required (we often can). If
not, we use the ability to have dynamic writemasks to sort it out.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
I forgot to copy this over in the LSC case. This meant we were missing
reorderability which meant that we were missing out on CSE.
fossil-db results on Battlemage:
Instrs: 231471427 -> 231363032 (-0.05%)
Send messages: 12077759 -> 12019628 (-0.48%)
Cycle count: 34058451430.0 -> 34057005552.0 (-0.00%); split: -0.01%, +0.00%
Spill count: 520387 -> 520135 (-0.05%)
Fill count: 470812 -> 470722 (-0.02%)
Max live registers: 72111834 -> 71873886 (-0.33%)
Totals from 2898 (0.37% of 788851) affected shaders:
Instrs: 1223836 -> 1115441 (-8.86%)
Send messages: 148633 -> 90502 (-39.11%)
Cycle count: 17732554.0 -> 16286676.0 (-8.15%); split: -10.65%, +2.49%
Spill count: 252 -> 0 (-inf%)
Fill count: 90 -> 0 (-inf%)
Max live registers: 491684 -> 253736 (-48.39%)
Non SSA regs after NIR: 255397 -> 255402 (+0.00%)
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
This now lowers IO intrinsics to URB intrinsics in a single step,
rather than modifying IO intrinsics to have non-standard meanings
temporarily. We are able to drop one "no_validate" flag.
For example, remap_patch_urb_offsets had added (vertex * stride) to
(offset) for per-vertex IO intrinsics, but left them as per-vertex
intrinsics. Now we just have an urb_offset() function to calculate
that when doing the lowering.
This also provides a central location for calculating URB offsets,
which we should be able to extend for other uses (per-view lowering,
mesh per-primitive lowering) in future patches.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
This block of code will be re-used in a future patch, also it reduces a bit the
size and complexity of lower_sampler_logical_send().
No changes in behavior intended here.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38792>
This is required for upcoming resource barrier work to implement HDC
flush's.
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38707>
Gfx20+ doesn't do PIPELINE_SELECT, the assumption is that we can now
do any PIPE_CONTROL we want regardless of the pipeline mode.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38707>
We experimentally found that some fixed functions have apparently be
hooked up to the L3. So we can drop a some flushing.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38707>
Now that we track the stages, it's not required to add those bits
anymore.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38707>
Now that we have the stages accumulated, we can delay this at flushing
time.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38707>
For a bunch of workarounds and special cases we want PIPE_CONTROL not
RESOURCE_BARRIER. We want emit_apply_pipe_flushes() to be mostly for
application barriers.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38707>
Keep only the metadata when initially parsing the files. Then re-load
the relevant archives again when necessary.
The old code was just keeping everything in memory, which was slow when
looking at a directory containing archives resulted from processing
a large fossil file.
Extra care is taken with `search` commands to ensure we don't keep
unnecessary contents around. At some point we could reorganize so
find_all is not used here, but for now this should be fine.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38228>
URB messages on Xe2 are LSC messages with FLAT addressing. We can
specify a S19 immediate offset in the extended message descriptor,
which should be more than adequate to hold any offsets we need.
We wrote the original URB code before implementing that, and never
doubled back to take advantage of it. But doing so can drop ADDs
near every URB access.
fossil-db results on Battlemage:
Totals:
Instrs: 232239759 -> 231432254 (-0.35%)
Cycle count: 34044435848.0 -> 34055507100.0 (+0.03%); split: -0.00%, +0.04%
Spill count: 520370 -> 520362 (-0.00%); split: -0.00%, +0.00%
Fill count: 470790 -> 470803 (+0.00%); split: -0.00%, +0.00%
Max live registers: 72111853 -> 72111369 (-0.00%); split: -0.00%, +0.00%
Totals from 227920 (28.89% of 788851) affected shaders:
Instrs: 59841897 -> 59034392 (-1.35%)
Cycle count: 683385208.0 -> 694456460.0 (+1.62%); split: -0.14%, +1.76%
Spill count: 17278 -> 17270 (-0.05%); split: -0.10%, +0.06%
Fill count: 17481 -> 17494 (+0.07%); split: -0.03%, +0.10%
Max live registers: 23052652 -> 23052168 (-0.00%); split: -0.00%, +0.00%
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38899>
I recently converted urb->offset to be in bytes on Xe2, but neglected to
update these comments that still said OWord.
Fixes: 9ffae42975 ("brw: Store brw_urb_inst::offset in bytes on Xe2")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38899>
It seems placing the shader at the end has a negative performance
impact.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 8ba197c9ef ("anv: Switch shaders to dedicated VMA allocator")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38900>