I noticed we disable the prefetch only on Gfx12.5. But surely that
recommendation carries on on later platforms.
It seems other drivers just disable it all the time and only have an
option to force the prefetch. So implementing the same thing here.
Blorp path is left untouched.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39424>
cef8eff74d ("radv/video: Override H265 SPS unaligned resolutions")
fixes the case where app specifies resolution with lower than required
alignment. But in case of higher alignment, the stream is still not
going to be correctly decodable.
Use size from session params to set the coded size, instead of using
codedExtent of input image.
Only use codedExtent to calculate padding.
Fixes dEQP-VK.video.encode.h265.quantization_map_delta*
Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39529>
Within the driver buffers are treated as 2D as sampling them as 1D
will run into HW restrictions on max size.
The compiler does the same however for atomic image ops the address
is manually calculated and doing this via the 2D path leads to
incorrect offsets.
The fix is to treat buffers as 1D for atomic ops which calculates
the correct offsets for the operations.
Fix deqp:
dEQP-VK.image.atomic_operations.add.buffer.*
dEQP-VK.image.atomic_operations.and.buffer.*
dEQP-VK.image.atomic_operations.compare_exchange.buffer.*
dEQP-VK.image.atomic_operations.dec.buffer.*
dEQP-VK.image.atomic_operations.exchange.buffer.*
dEQP-VK.image.atomic_operations.inc.buffer.*
dEQP-VK.image.atomic_operations.max.buffer.*
dEQP-VK.image.atomic_operations.min.buffer.*
dEQP-VK.image.atomic_operations.or.buffer.*
dEQP-VK.image.atomic_operations.sub.buffer.*
dEQP-VK.image.atomic_operations.xor.buffer.*
Fixes: 6dc5e1e109 ("pco: fully support Vulkan 1.2 image atomics")
Signed-off-by: Nick Hamilton <nick.hamilton@imgtec.com>
Reviewed-by: Simon Perretta <simon.perretta@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39521>
This reverts commit 6eadcaa851.
VK_EXT_primitives_generated_query has a dependency on
VK_EXT_transform_feedback, which we do not implement yet. This is
breaking the android CTS. It will be reenabled once transform feedback
is in.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39547>
For mesh/task shaders, the thread payload provides a local invocation
index, but it's always linear so it doesn't give the correct value when
quad derivatives are in use.
The lowering pass where all of this is done correctly for compute
shaders assumes load_local_invocation_index will be lowered in the
backend for mesh/task, calculates the values for the quads correctly but
then avoid replacing the original intrinsic and we remain with the wrong
results.
Add an intel specific intrinsic and always lower the generic one to that
(or whatever else was calculated) to avoid ambiguities and fix the value
for quad derivatives.
Fixes future CTS tests using mesh/task shaders under:
dEQP-VK.spirv_assembly.instruction.compute.compute_shader_derivatives.*
Fixes: d89bfb1ff7 ("intel/brw: Reorganize lowering of LocalID/Index to handle Mesh/Task")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39276>
Instead of unconditionally emitting the dither table during GPU state
reset, only emit it when alpha_to_coverage is actually enabled in
the blend state. A tracking flag avoids redundant re-emission until the
next GPU state reset.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39557>
The wording in the RDNA3 ISA doc was since clarified, v_cmpx with DPP
behaves exactly like one would expect:
FI controls whether the source value can be read from inactive lanes,
but inactive lanes always write a 0 bit. The same applies to v_cmp with DPP.
Foz-DB Navi48:
Totals from 987 (1.20% of 82405) affected shaders:
Instrs: 517003 -> 516445 (-0.11%); split: -0.11%, +0.00%
CodeSize: 2782688 -> 2780508 (-0.08%); split: -0.08%, +0.00%
Latency: 2059169 -> 2056327 (-0.14%); split: -0.14%, +0.00%
InvThroughput: 365374 -> 365328 (-0.01%); split: -0.03%, +0.01%
Copies: 64669 -> 65616 (+1.46%)
SALU: 70693 -> 70652 (-0.06%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39516>
Allow surface redescription when fast-clearing a layer > 0. This affects
at least five traces in the performance CI, but the CI doesn't report
any performance benefit from this. We already had code to handle unaligned
rows at the bottom of an image. Now that this handles the misalignment at
the top of the image range, we gain some symmetry.
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
On Xe2+, support multi-layer and non-zero-layer CCS fast-clears. To do
this in a simple manner, drop the code which splits multi-layer clears
into fast clears and slow clears. The performance CI reports no
regressions nor improvements on BMG.
For MCS on all platforms and for CCS on prior platforms, use a new
heuristic. Instead of only allowing fast clears on the first
slice/layer, do the following:
For 3D images, only fast-clear if all slices are cleared. Enables
fast-clearing every slice of 3D textures in:
* Terminator Resistance - 480x270x128.
* Ghostrunner 2 - 320x180x128.
For 2D arrays, match the Xe2+ behavior and allow clearing to any layer.
This is possible because we only allow fast-clearing if the clear color
matches the default value. Enables fast-clearing every layer of 2D array
textures in:
* Assassin's Creed - 128x128, 6-layers.
* Blackops 3 - 1024x1024, 6-layers.
* Borderlands 3 - 128x128, 6-layers.
* Cyberpunk - 1024x1024, 10-layers.
* Unigine Superposition - 4K, 2-layers.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11893
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
A future commit will enable clearing to more than the first layer of 2D
array images. To ensure consistency for the clear color, require the
ANV_FAST_CLEAR_DEFAULT_VALUE for such images if they make use of
ISL_AUX_STATE_CLEAR. Also, use a non-zero default value for some image
formats.
I tested the majority of workloads in the performance CI. This will
cause those which clear to 2D array layers to gain clears on more than
just the first layer. At the moment, we still only support clearing the
first layer, so there should be no change in performance. Affected games
are documented in the code.
Acked-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
Don't return early from anv_layout_to_fast_clear_type() for Xe2+. We'll
need to make more use of the function for some MCS changes in later
commits.
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
Now that hasvk is the driver for supporting HSW and BDW, we no longer
need to convert CCS_D partial resolves to full resolves to avoid an
assert-failure in BLORP.
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
This will make handling fast-clears on multiple layers simpler by saving
us from having to pass more parameters into fast-clear state setting
functions.
It also allows us to set more complex fast-clear state for FCV_CCS_E
without marking the image as compressed.
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
Enables more support for FCV_CCS_E partial resolves if we ever need it.
Also enables support for multiple layers being fast cleared and needing
resolves. Support for that will arrive in several commits.
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
We started allowing non-default clear colors with FCV in commit
cd8e120b97. When rendering to an image with FCV, set the fast-clear
type to ANV_FAST_CLEAR_ANY if the image properties allow such
fast-clears.
Fixes: cd8e120b97 ("anv: Allow more single subresource fast-clears with FCV")
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>
On Xe2+, HSD 14011946253 and the related documents explain that MCS
still only supports a single clear color.
Fixes: df006bba02 ("iris: Update aux state for color fast clears (xe2)")
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37660>