Xe2 can easily support fast-clearing depth buffers to multiple clear
values. Instead of assuming a hard-coded value in various parts of the
driver, pass the clear value down the expected paths.
For consistency, also adjust the slow depth clear function to have a
matching parameter.
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30767>
Some games such as Marvel's Spider-Man Remastered and Assassin's
Creed: Valhalla don't work in debug mode because they hit this
assertion. In Release mode, they appear to work (although in some
platforms there may be visual corruption or GPU hangs). There's
nothing we can do about this error (see below), so in this patch we
replace the assertion with an error message, because it allows us to
(i) test the rest of the game in debug mode so we may catch other
issues; and (ii) warn users of release mode that the issue is
happening.
The unsupported num_elements comes from vkGetDescriptorEXT() and
appears to be violating VUID-VkDescriptorGetInfoEXT-type-09427. This
function cannot return errors, but we can disable
VK_EXT_descriptor_buffer.
If we do disable the extension, then vkCreateBufferView() will start
triggering the assertion, and we can see that
VkBufferViewCreateInfo-range-00930 is being violated. If we change Anv
to return errors on these vkCreateBufferView() cases, then the games
won't work at all.
I reported this to vkd3d-proton, but according to the vkd3d-proton
developer Philip Rebohle:
"There's also the problematic case of games using typed descriptors
but passing non-typed buffer descriptors, which is an extremely
common app bug that works on all D3D12 drivers that we need to work
around by creating typed views. If that's what's happening here then
the best we can do is to just not create the typed view and have the
game be broken entirely, or create a smaller view and most likely
still completely break the game, but at least that way it wouldn't
trigger Vulkan validation. Emulating larger views via multiple
smaller views is not possible for us."
"Confirmed that it's the app itself creating these views."
"D3D12 does not have runtime validation for this or any sort of query
for the app, so we really can't do much here."
Link: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9963
Link: https://github.com/HansKristian-Work/vkd3d-proton/issues/2071
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30775>
We want to make sure the workaround_address is the last item in the BO
so that we don't have to care about the size of the writes going
there, we'll be sure they won't overwrite other items in that BO.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 7b9400b7f7 ("intel/blorp: Don't use clear color conversion on gfx12")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11775
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30844>
The workaround BO has some debug information at the beginning. The
workaround address is placed after that.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30844>
All drivers update the clear color themselves. So, drop the
functionality from BLORP as well as the flag controlling it.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30824>
We're going to be storing clear colors from the drivers rather than
BLORP. Add a function for this purpose.
For now, the first use replaces init_fast_clear_color(). One change in
behavior is that the clear color initialization is now done without
write-checking on gfx12. This actually matches what anv does to other
writes to the image's fast-clear tracking state. We can fix this later
if and when we address the larger issue.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30824>
The default state doesn't seem well defined (or kernel driver bug
maybe?). Let's just set it to disabled on platforms where we're not
using it.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Found-by: Chuansheng Liu <chuansheng.liu@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30841>
The long names were originally meant to map to the HW encoding but
nowadays the actual encoding values depend on gfx version, whether
instruction is 3src, etc.
Suggested by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>
Create specific helper for register file encoding and handle it there.
Use ad-hoc structs to let the macro take optional named arguments.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>
In these cases there's a clear bound we can use. In C++ this is a
compiler extension and not compatible with zero initializing a
regular struct -- which will happen in a later change.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30704>
The hardware's clear color conversion feature requires invalidating the
texture cache for every fast clear. We're no longer using the hardware
feature, so we longer need the invalidation.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30646>
Instead of using the clear color conversion feature by the hardware, use
software to write out the converted clear color pixel.
When testing a patch which moves a state cache invalidate to occur after
fast clears instead of before, this prevents the following failures on
tgl/zink:
* piglit.spec.arb_texture_cube_map_array.arb_texture_cube_map_array-cubemap
* piglit.spec.ext_framebuffer_object.fbo-generatemipmap-formats
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30646>
The hardware's clear color conversion feature unfortunately requires
invalidating the texture cache for every fast clear. To avoid the
performance penalty that comes with the invalidation, avoid using the
hardware feature and write out the converted clear color pixel
ourselves.
When testing a patch which moves a state cache invalidate to occur after
fast clears instead of before, this prevents the following failures on
icl/zink:
* piglit.fast_color_clear.fcc-read-after-clear sample tex
* piglit.spec.arb_clear_texture.arb_clear_texture-cube
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30646>
Prevents the next patch from failing many multisampled, signed integer
rendering tests. For example:
dEQP-VK.renderpass2.suballocation.multisample_resolve.r8_sint.samples_4
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30646>
Following up on f8c0a99d52 ("anv: emit conditional after gfx state
flushing"), this should have been applied everywhere.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 0147908a89 ("anv: predicate emission of STATE_BASE_ADDRESS")
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30803>
The other dump methods in this file also take a file parameter,
defaulting to stderr. Dumping dot files to stdout is probably not
what anybody really wanted.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30530>
Post-register allocation, but before brw_fs_lower_vgrfs_to_fixed_grfs,
we have registers with the VGRF file but they are actually fixed GRFs.
brw_print_instructions_to_file() was seeing VGRFs and trying to access
their size, but using bogus register numbers that could be out-of-bound.
Detect when we're post-RA and avoid doing this.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30530>
These are pure VK-CTS jobs, they don't run any GL tests.
It doesn't matter right now because these two jobs are disabled, but
when they get re-enabled, we'll want this to have been fixed.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30677>
Create a new stage called intel-postmerge and move the full and manual
jobs over there, to avoid entanglement with the pre-merge jobs.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30784>
We were specifying align_offset = 64 and align_mul = 64, which is
invalid. nir_combined_align() asserts that align_offset < align_mul.
Our intention here is to perform cacheline-aligned (64B-aligned) block
loads, so we should set align_mul = 64 and can leave align_offset = 0.
Fixes: fbafa9cabd ("intel/nir: remove load_global_const_block_intel intrinsic")
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30755>
At least one ray tracing shader in cp2077 is over 4MB on Xe2. There
isn't a memory pool large enough for the allocation, so the driver
crashes instead. This commit adds 8MB and 16MB pools.
I intend this as a stop gap fix. I would prefer to figure out why this
shader is so much larger than on previous platforms. The shader in
question has 3824 spills and 8625 fills. That is not good. I suspect
dealing with that will also solve the problem, but that will require a
bit more time.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11739
Suggested-by: Lionel Landwerlin
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30751>
We were currently emitting logical atomic instructions with a packed
destination region for sub-dword LSC atomics, along the lines of:
> untyped_atomic_logical(32) dst<1>:HF, ...
However, these instructions use an LSC data size D16U32, which means
that the 16b data on the return payload is expanded to 32b by the LSC
shared function, so we were lying to the compiler about the location
of the individual channels on the return payload, its execution
masking, etc. This is why the hacks that manually set the
'inst->size_written' of the instruction were required.
In some cases this worked, but any non-trivial manipulation of the
instruction destination by lowering or optimization passes could have
led to corruption, as has been reproduced in deqp-vk during
lower_simd_width() for shaders that use 16-bit atomics in SIMD32
dispatch mode.
Note that LSC sub-dword reads aren't affected by this because they use
raw UD destinations and specify the actual bit size of the operation
datatype as the immediate SURFACE_LOGICAL_SRC_IMM_ARG, which doesn't
work for atomic operations since that immediate specifies the atomic
opcode.
Instead, have the logical operation implement the behavior of 16-bit
destinations correctly instead of silently replacing the 16-bit region
with an inconsistent 32-bit region -- This is done by emitting the MOV
instructions used to pack the data from the UD temporary into the
packed destination from the lower_logical_sends() pass instead of from
the NIR translation pass.
Fixes: 43169dbbe5 ("intel/compiler: Support 16 bit float ops")
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30683>
This improves drivers in the following ways:
* iris_hiz_exec() and crocus_hiz_exec() gets rid of the narrowly-used
update_clear_depth parameters.
* iris avoids fast-clearing if the aux state is CLEAR. crocus avoids
this too, but didn't actually need it in the first place.
* iris updates the value once per fast_clear_depth() call instead of
doing an update for each layer being cleared.
* anv now updates the clear value when transitioning from an undefined
layout instead of doing so on every fast-clear. This should be safer
because we don't perform state cache invalidates when changing the
clear value. So, existing surface states won't have any stale values.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30520>
The new workaround tries to strike a balance between simplicity and
functionality (for testing purposes). Instead of checking for the
alignment of a specific LOD when fast-clearing, we take an
all-or-nothing approach for LOD1+.
I haven't found any app to clear LOD1+ except for a Dirt Rally trace
some time ago. If I remember correctly, that trace clears all LODs,
doesn't render to them, then clears again with a different color,
incurring resolves. So, skipping LOD1+ fast clears will avoid those
resolves. Other apps I tested include Synmark2, glmark2, GfxBench5, and
the Vulkan games in internal our benchmarking tool.
Now that we've added updated and simplified checks in the drivers
themselves, we delete blorp_can_hiz_clear_depth.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30250>
None of our tracked games use partial depth clears, so only allow it in
simple cases for testing purposes. This change also fixes an issue on
gfx8, where we had been accidentally disabling full surface clears if
the LOD was not 8x4 aligned.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30250>
I did some more debugging of this feature, but this time with a modified
version of the piglit test, ./bin/depthstencil-render-miplevels.
I modified the test to:
* Control which LOD to stop populating/clearing
* Print out the results of readpixels to stderr
From there, I could see how different surface dimensions affected
fast-clears. Depending on the surface dimensions, fast-clearing an LOD
above the LOD0 could cause other LODs to be cleared and/or cause the
targeted LOD to be only partially cleared (for example, when the LOD0
dimension is 66x66 and the test doesn't clear LOD3+). This never happens
when fast-clearing LOD0 however.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5258
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30250>
For correct fast-clearing with HiZ+CCS, we require roughly 16x8
alignment of LODs. The next patch will cause drivers to ignore the
alignment of LOD0, so align the qpitch to 8 to avoid breakage and so
that fast clears will be enabled more often.
Prevents failures with the piglit test case:
./bin/fbo-depth-array depth-clear -fbo
in the next patch.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30250>
SBID SET can only be used on SEND, SENDC, or DPAS instructions. The
existing code was handling SET for SEND/SENDC, but was using the wrong
encoding for DPAS. Add a new case to handle that and make it clear that
the existing code is only for SEND/SENDC.
While here, rewrite the encoder to use 2-bit binary immediates shifted
up into the mode [9:8] field, rather than pre-shifted hex values. This
matches the documentation better and is a little easier to follow.
On the decode side, we were incorrectly decoding MATH instructions.
Because they're marked is_unordered, we were hitting the SEND/SENDC
decoding, which is incorrect for MATH.
Fixes 22 cooperative matrix tests on Lunar Lake.
Huge thanks to Paulo Zanoni for bisecting failures to one of my commits,
then analyzing shaders and experimenting to discover that the failure
was really an unrelated bug, just being provoked by different choices of
registers. His work narrowing the problem down made it much easier to
discover and fix this bug.
Backport-to: 24.2
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30705>
We're going to need to handle encoding/decoding differently for DPAS vs.
SEND/SENDC vs. other instructions. Pass the opcode so we can figure out
the encodings for each type of instruction.
Backport-to: 24.2
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30705>