All drivers update the clear color themselves. So, drop the
functionality from BLORP as well as the flag controlling it.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30824>
Instead of using the clear color conversion feature by the hardware, use
software to write out the converted clear color pixel.
When testing a patch which moves a state cache invalidate to occur after
fast clears instead of before, this prevents the following failures on
tgl/zink:
* piglit.spec.arb_texture_cube_map_array.arb_texture_cube_map_array-cubemap
* piglit.spec.ext_framebuffer_object.fbo-generatemipmap-formats
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30646>
The hardware's clear color conversion feature unfortunately requires
invalidating the texture cache for every fast clear. To avoid the
performance penalty that comes with the invalidation, avoid using the
hardware feature and write out the converted clear color pixel
ourselves.
When testing a patch which moves a state cache invalidate to occur after
fast clears instead of before, this prevents the following failures on
icl/zink:
* piglit.fast_color_clear.fcc-read-after-clear sample tex
* piglit.spec.arb_clear_texture.arb_clear_texture-cube
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30646>
This improves drivers in the following ways:
* iris_hiz_exec() and crocus_hiz_exec() gets rid of the narrowly-used
update_clear_depth parameters.
* iris avoids fast-clearing if the aux state is CLEAR. crocus avoids
this too, but didn't actually need it in the first place.
* iris updates the value once per fast_clear_depth() call instead of
doing an update for each layer being cleared.
* anv now updates the clear value when transitioning from an undefined
layout instead of doing so on every fast-clear. This should be safer
because we don't perform state cache invalidates when changing the
clear value. So, existing surface states won't have any stale values.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30520>
The new workaround tries to strike a balance between simplicity and
functionality (for testing purposes). Instead of checking for the
alignment of a specific LOD when fast-clearing, we take an
all-or-nothing approach for LOD1+.
I haven't found any app to clear LOD1+ except for a Dirt Rally trace
some time ago. If I remember correctly, that trace clears all LODs,
doesn't render to them, then clears again with a different color,
incurring resolves. So, skipping LOD1+ fast clears will avoid those
resolves. Other apps I tested include Synmark2, glmark2, GfxBench5, and
the Vulkan games in internal our benchmarking tool.
Now that we've added updated and simplified checks in the drivers
themselves, we delete blorp_can_hiz_clear_depth.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30250>
None of our tracked games use partial depth clears, so only allow it in
simple cases for testing purposes. This change also fixes an issue on
gfx8, where we had been accidentally disabling full surface clears if
the LOD was not 8x4 aligned.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30250>
I did some more debugging of this feature, but this time with a modified
version of the piglit test, ./bin/depthstencil-render-miplevels.
I modified the test to:
* Control which LOD to stop populating/clearing
* Print out the results of readpixels to stderr
From there, I could see how different surface dimensions affected
fast-clears. Depending on the surface dimensions, fast-clearing an LOD
above the LOD0 could cause other LODs to be cleared and/or cause the
targeted LOD to be only partially cleared (for example, when the LOD0
dimension is 66x66 and the test doesn't clear LOD3+). This never happens
when fast-clearing LOD0 however.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5258
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30250>
For correct fast-clearing with HiZ+CCS, we require roughly 16x8
alignment of LODs. The next patch will cause drivers to ignore the
alignment of LOD0, so align the qpitch to 8 to avoid breakage and so
that fast clears will be enabled more often.
Prevents failures with the piglit test case:
./bin/fbo-depth-array depth-clear -fbo
in the next patch.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30250>
Instead of unconditionally emitting a pipe control on gfx11+, use the
workaround helpers for workarounds 1408224581 and 14014097488. Also, add
a check for workaround 14016712196, which is also impacted.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29922>
From Bspec 56423 (r58507), the legacy full resovling and
partial resolving options are gone since Xe2. They also
cause hang on Xe2 if not disabled.
Some suggested code from Nanley Chery <nanley.g.chery@intel.com> is
included.
Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29906>
Previously, we left it NULL until later in the compile. However, some
builder helpers are starting to check the options and they blow up when
options == NULL.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27617>
The MCS region to ambiguate needs to shift 4KB from its
starting address. The first 4KB is reserved for hardware.
Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28919>
The scaledown rectangle of MSAA fast clear on Xe2 is 8 times
in X and 2 in Y dimension of previous platforms.
Absorb refactoring change suggested by
Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28919>
I'd like to phase out the ISL surface representation of CCS on gfx120 in
order to enable CCS without a 512B-aligned main surface pitch. Remove
the dependency on CCS ISL surfaces when fast-clearing to move drivers
one step towards that goal.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28536>
The vertical alignment of the fast-clear rectangle shrinks as the
bits-per-block of the CCS format increases.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28536>
Up to now preferred SLM size was being set to maximum preferred SLM
size for GFX 12.5 platforms and to workgroup SLM size for Xe2 but
neither of those values are the optimal.
The optimal value is:
<number of workgroups that can run per subslice> * <workgroup SLM size>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28910>
Xe2 has 2 requirements for preferred SLM size:
- this value needs to be >= then SLM size
- this value must be less than shared SLM/L1$ RAM in the sub-slice of platform
Also Xe2 don't have the special '0' encode that sets preferred SLM
allocation size to the maximum supported.
So here setting a value that is equal or larger than SLM size.
It was always setting SLM_ENCODES_128K for LNL A0 stepping probably
because of Wa_16018610683 but this restriction applies to all Xe2
platforms, also because of the first restriction mentioned here
this workaround is not being properly implemented, will fix that
in the next patch.
We should have a formula to calculate a preferred SLM allocation size
for gfx125 and Xe2 platfoms but until that this is enough to fix at
least the applications and tests below on LNL:
- GFXBench Aztec Ruins VK
- GravityMark VK
- Wildlife Extreme VK
- 5 crucible tests
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28910>
This functions were inlined in a header and duplicated between brw and
elk.
That would be enough reasons to move to a C file but next patches
will add more code to support Xe2 platforms, what would cause more
code to be inlined, duplicating even more code and increasing lib
size.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28910>
According to the simulator a cacheline of the blend state cache
corresponds to 3 cachelines of L3 that are always filled regardless of
the number of render targets in use. Allocate enough space to avoid
pagefaults under simulation, since a scratch page isn't bound by
default.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28283>
Of the dynamic states we have blorp reemit for each operations, a few
actually never change :
* BLEND_STATE (it looks like it does, but actually for anv no)
* COLOR_CALC_STATE
* CC_VIEWPORT
* SAMPLER_STATE
We add infrastructure here to upload into the driver and retrieve the
state offset later.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28368>
There are multiple problems currently :
- blorp blitter commands overwrite the protection value coming from
the driver
- anv & iris are using render target MOCS for compute commands
Driver already have the ability to pass the MOCS values so we choose
to stick to that in this change. But now the driver need to select the
right MOCS depending on the engine the commands are going to run onto.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27956>
Previously we were optimistic and tied this to certain format but wa
description lists other formats and bspec clearly disallows the usage.
Issue can be seen with different 16bpp tests, effect looks a bit like
dithering pattern but it is not, it is just rep16 failing.
Fixes:
GTF-GL46.gtf42.GL3Tests.texture_storage.texture_storage_texture_as_framebuffer_attachment
on DG2 and MTL, some 565 EGL tests on Android and internal issue on game
that displays a dither like pattern on the background while it's not
supposed to do that.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10646
Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27794>