Comparing uint8_t max value 255 with devinfo->verx10 will work fine for
now but for future platforms, comparison will fail. To avoid this
let's switch the field data type from 8-bits to 16-bits.
v1: (Jordan)
- Use 16 bits instead of 32 and add assertion.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25478>
SIMD1 instructions are problematic because they are considered partial
writes. This increases the liveness of the destination register
written by those instructions. To workaround this we use UNDEF
instructions to bound the liveness of the register. But this causing
other issues like in this case :
undef(1) vgrf2
mov(1) vgrf2, u4.0
add(1) vgrf3, vgrf2.0, 64UD
In this case the copy propagation pass in unable to see that vgrf2 in
the add() instruction can be replaced with the uniform u4.0.
To fix this problem, we switch NoMask SIMD8 instructions that cover
the entire register. We can drop the UNDEF instructions and now copy
propagation can do its job.
Good results on 2 apps :
Cyberpunk 2077 :
Totals from 7258 (68.80% of 10549) affected shaders:
Instrs: 6332210 -> 6073833 (-4.08%); split: -4.11%, +0.03%
Cycles: 130667501 -> 127351268 (-2.54%); split: -3.12%, +0.58%
Subgroup size: 90320 -> 90400 (+0.09%)
Spill count: 90 -> 68 (-24.44%)
Fill count: 82 -> 64 (-21.95%)
Scratch Memory Size: 8192 -> 6144 (-25.00%)
Max live registers: 385464 -> 375152 (-2.68%)
Max dispatch width: 64336 -> 64424 (+0.14%); split: +0.96%, -0.82%
Gaining 60 SIMD16/SIMD32 shaders, loosing 33
Strange Brigade :
Totals from 2137 (53.12% of 4023) affected shaders:
Instrs: 1544031 -> 1457544 (-5.60%); split: -5.60%, +0.00%
Cycles: 22292564 -> 21868978 (-1.90%); split: -2.43%, +0.53%
Subgroup size: 25328 -> 25344 (+0.06%)
Max live registers: 113716 -> 111214 (-2.20%)
Max dispatch width: 17232 -> 18608 (+7.99%); split: +8.36%, -0.37%
Gaining 138 SIMD16/SIMD32 shaders, loosing 4
On app slightly negatively affected :
Dota2 :
Totals from 232 (14.73% of 1575) affected shaders:
Instrs: 30029 -> 28194 (-6.11%)
Cycles: 385155 -> 371422 (-3.57%); split: -3.59%, +0.02%
Max live registers: 6792 -> 6780 (-0.18%)
Max dispatch width: 2256 -> 2160 (-4.26%)
Loosing 6 SIMD32 shaders
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24554>
Some recent NIR changes started generated those instructions. We need
to handle them to be able to rematerialize.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24554>
This of course only applies to xe.ko. There is no reason to keep it
disabled by default.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23045>
This pollutes stderr a lot, but I've used it countless times while
developing this code.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23045>
Game testing shows it's common for this operation to result in
multiple bind regions, so try to use a single ioctl when we can.
Actual testing reveals 136 shader-related tests fail when we actually
do this, so for now keep doing a single bind per ioctl while leaving a
very easy way to the desired behavior when we figure this out.
It should also be possible to go even higher-level and do this at the
anv_queue_submit_sparse_bind_locked() layer, but that should happen in
future commits.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23045>
This giant patch implements a huge chunk of the Vulkan Sparse
Resources API. I previously had this as a nice series of many smaller
patches that evolved as the xe.ko added more features, but once I was
asked to squash some of the major reworks I realized I wouldn't be
able easily rewrite history, so I just squased basically the whole
series into a giant patch. I may end up splitting this again later if
I find a way to properly do it.
If we want to support the DX12 API through vkd3d we need to support
part of the the Sparse Resources API. If we don't, a bunch of Steam
games won't work.
For now we only support the xe.ko backend, but the vast majority of
the code is KMD-independent and so an i915.ko implementation would use
most of what's here, just extending the part that binds and unbinds
memory.
v2+: There's no way to sanely track the version history of this patch
in this commit message. Please refer to Gitlab.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23045>
The only thing that changes between these 3 checks is the size.
This entire patch was suggested by Kenneth Graunke, I just converted
his gitlab comment to a git commit.
Credits-to: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23045>
Vulkan Sparse resources have their own set of rules, so here we try to
make ISL aware of them through ISL_SURF_USAGE_SPARSE_BIT.
The big deal here is when some image ends up not using Tile64 nor
TileYs. Previously Ys was not supported on TGL at all, and Tile64 did
not have support for 3D. Now we still have some formats that end up
not being used with either Tile64 and Ys, but need to support Sparse
on them (e.g., YUV on Tile64). In the future we may have new tiling
formats or hardware restrictions that would force this case to happen
again.
So here we do some adjustments so we can make sparse work with other
tiling formats, although with limited functionality (e.g., those
formats may be restricted to opaque binds, and certainly don't support
the standard block shapes).
v2: before we had Ys support, we had defined TGL's block size as 4k.
v3: move the size_B chunk to before nte notify_failure() checks (Ken).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23045>
This is what most logical SEND messages do when they take a variable
number of components. 'inst->mlen' is expected to be zero for logical
SEND opcodes, which are expected to behave like plain arithmetic
operations, so certain automated transformations (like SIMD lowering)
can manipulate them without opcode-specific special-casing.
Guessing the number of components from 'inst->mlen' has other
disadvantages, because it requires duplicating the logic that infers
the message payload size in every use of the instruction -- Instead we
can just do the computation once during logical send lowering. In
addition on LNL platform this causes the 'inst->mlen' field of URB
writes to have units inconsistent with every other SEND instruction,
which is likely to lead to confusion and bugs down the road.
Rework:
* Marcin: update emit_urb_indirect_vec4_write
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25195>
v2: Account for 512b physical registers which causes the URB handle to be in FIXED_GFR 2 instead of 1.
XXX - Use fs_builder::vgrf() instead of open-coded dispatch_width calculations.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25195>
Rework:
* Sqaushed in changes Marcin Ślusarz's patches:
* "intel/compiler: skip adding 0 to payload address"
* "intel/compiler/xe2: drop masking off top 8 bits of URB handle"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25195>
We missed out that ICL+ added a programming requiring a CS_STALL.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25426>
Those are used in the failure paths and are easily retriavable from the
stage itself, so no need to store them.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25367>