As previously documented, this mode either uses LRZ or early-z
(when LRZ is invalid).
Though it has some limitations, it's not compatible with:
- Lack of D/S attachment
- Stencil writes on stencil or depth test failure
- Per-sample shading
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34973>
This change is an update to 42be38a8fb. It fixes the remaining
depth24_stencil8 mipmap issues.
This change was tested with the test below modified to check for
every width and height between (1,1) and (143,143), the levels
are tested between 0 and 5.
This change was tested on r600 cypress, palm, barts and cayman.
Here are the tests fixed:
khr-gl(3[0-3]|4[0-5])/texture_repeat_mode/depth24_stencil8_11x131_1_clamp_to_edge: fail pass
khr-gl(3[0-3]|4[0-5])/texture_repeat_mode/depth24_stencil8_11x131_1_mirrored_repeat: fail pass
khr-gl(3[0-3]|4[0-5])/texture_repeat_mode/depth24_stencil8_11x131_1_repeat: fail pass
khr-gles3/texture_repeat_mode/depth24_stencil8_11x131_1_clamp_to_edge: fail pass
khr-gles3/texture_repeat_mode/depth24_stencil8_11x131_1_mirrored_repeat: fail pass
khr-gles3/texture_repeat_mode/depth24_stencil8_11x131_1_repeat: fail pass
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Acked-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34406>
This functionality is useful with the software fp64
implementation. It allows running the remaining
tests.
Note: the same tests do not generate this indirect
access on cayman which has the hardware fp64
implementation enabled.
This change was tested on cypress, palm and barts.
Here are the tests fixed:
spec/arb_gpu_shader_fp64/execution/gs-isnan-dvec: fail pass
spec/arb_gpu_shader_fp64/uniform_buffers/gs-array-copy: fail pass
spec/arb_gpu_shader_fp64/uniform_buffers/gs-dmat4: fail pass
spec/arb_gpu_shader_fp64/uniform_buffers/gs-dmat4-row-major: fail pass
spec/arb_gpu_shader_fp64/uniform_buffers/gs-double-array-const-index: fail pass
spec/arb_gpu_shader_fp64/uniform_buffers/gs-double-array-variable-index: fail pass
spec/arb_gpu_shader_fp64/uniform_buffers/gs-double-bool-double: fail pass
spec/arb_gpu_shader_fp64/uniform_buffers/gs-double-uniform-array-direct-indirect: fail pass
spec/arb_gpu_shader_fp64/uniform_buffers/gs-doubles-float-mixed: fail pass
spec/arb_gpu_shader_fp64/uniform_buffers/gs-dvec4-uniform-array-direct-indirect: fail pass
spec/arb_gpu_shader_fp64/uniform_buffers/gs-nested-struct: fail pass
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34926>
It was accidentally made sticky when LRZ is disabled. That resulted
in a big perf regression in some games.
Fixes: 847ad80e03 ("tu/lrz: Consider FS depth layout when gl_FragDepth is written")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35029>
v12+ supports this, let's expose it.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34997>
We had a duplicate function there, let's use common code instead and
allow v4.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34997>
v12+ supports this, let's expose it.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Olivia Lee <olivia.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34997>
It's still legal to include VkTimelineSemaphoreSubmitInfo in pNext
without any timeline semaphores. While we are at it, also fix the VUs.
This was observed with Doom The Dark Ages.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35022>
The 2 helpers we're using for doing internal operations (copies,
command generation, etc...) can work on command buffers or lower level
batches.
When working with command buffers, the helpers should set the
preemption using genX(cmd_buffer_set_preemption) so that whatever
operation comes after toggles the state back to what it needs and we
minimize the toggles.
When working with batchs, the helpers should disable preemption using
genX(batch_set_preemption) and turn it back on when done.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35030>
Otherwise we end up removing refcount even when we don't have a color
surface already for applications going from SRGB_NONLINEAR to
PASS_THROUGH dring runtime.
To reproduce the bug, start mpv with "--target-colorspace-hint=yes" then
set it to "no" during runtime with a keybind
Fixes: 789507c99c ("vulkan/wsi: implement the Wayland color management protocol")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34708>
Displayable compressed resournces have a different PAT
entry from the non-displayable compressed.
Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29928>
Two new heaps are introduced to use a different PAT entry
for compressed buffers to display.
Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29928>
We need two PAT entries with compression for displayable and
non-displayable compressed images. The current 'compressed' entry
is renamed to 'scanout_compressed' for the displayable.
Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29928>
These were initially observed in Hogwarts Legacy while working on
something else entirely. Two compute shaders in that app are helped
for spills and fills. On Skylake, one of the shaders benefits from
this change, and the other is hurt pretty significantly.
About 40 vertex shaders in Shadow of the Tomb Raider were helped for
instructions.
v2: Use ~0xff instead of 0xffffff00 to ensure the patterns will work
properly with all bit sizes. Noticed by Georg.
v3: No, really, fix the various errors to ensure the patterns will work
properly with all bit sizes. Noticed by Georg.
No shader-db changes on any Intel platform.
fossil-db:
Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake)
Totals:
Instrs: 210566294 -> 210561118 (-0.00%)
Cycle count: 31582309052 -> 31576352808 (-0.02%); split: -0.02%, +0.00%
Spill count: 519300 -> 519280 (-0.00%)
Fill count: 625181 -> 625161 (-0.00%)
Scratch Memory Size: 36289536 -> 36281344 (-0.02%)
Max live registers: 66068413 -> 66068161 (-0.00%)
Non SSA regs after NIR: 60230773 -> 60230775 (+0.00%)
Totals from 1662 (0.24% of 707082) affected shaders:
Instrs: 635064 -> 629888 (-0.82%)
Cycle count: 36549632 -> 30593388 (-16.30%); split: -16.43%, +0.14%
Spill count: 246 -> 226 (-8.13%)
Fill count: 280 -> 260 (-7.14%)
Scratch Memory Size: 16384 -> 8192 (-50.00%)
Max live registers: 178491 -> 178239 (-0.14%)
Non SSA regs after NIR: 169552 -> 169554 (+0.00%)
Tiger Lake
Totals:
Instrs: 238544730 -> 238539407 (-0.00%)
Cycle count: 23679446097 -> 23673238578 (-0.03%); split: -0.03%, +0.00%
Max live registers: 42494925 -> 42494799 (-0.00%)
Non SSA regs after NIR: 63639071 -> 63639074 (+0.00%)
Totals from 1662 (0.21% of 802704) affected shaders:
Instrs: 626604 -> 621281 (-0.85%)
Cycle count: 26444363 -> 20236844 (-23.47%); split: -23.50%, +0.02%
Max live registers: 95405 -> 95279 (-0.13%)
Non SSA regs after NIR: 181150 -> 181153 (+0.00%)
Ice Lake
Totals:
Instrs: 238855310 -> 238826534 (-0.01%)
Cycle count: 24952257277 -> 24944589398 (-0.03%); split: -0.03%, +0.00%
Spill count: 575510 -> 575117 (-0.07%)
Fill count: 713007 -> 708632 (-0.61%)
Max live registers: 42499556 -> 42499432 (-0.00%)
Non SSA regs after NIR: 64388747 -> 64388750 (+0.00%)
Totals from 1662 (0.21% of 805149) affected shaders:
Instrs: 926887 -> 898111 (-3.10%)
Cycle count: 67025583 -> 59357704 (-11.44%); split: -11.45%, +0.01%
Spill count: 5168 -> 4775 (-7.60%)
Fill count: 32883 -> 28508 (-13.30%)
Max live registers: 95614 -> 95490 (-0.13%)
Non SSA regs after NIR: 181150 -> 181153 (+0.00%)
Skylake
Totals:
Instrs: 161904416 -> 161895239 (-0.01%); split: -0.01%, +0.00%
Cycle count: 20098067714 -> 20090767583 (-0.04%); split: -0.04%, +0.00%
Spill count: 525546 -> 525789 (+0.05%); split: -0.04%, +0.09%
Fill count: 603369 -> 602276 (-0.18%); split: -0.28%, +0.10%
Max live registers: 33895714 -> 33895590 (-0.00%)
Non SSA regs after NIR: 57348729 -> 57348730 (+0.00%)
Totals from 1655 (0.25% of 653734) affected shaders:
Instrs: 769979 -> 760802 (-1.19%); split: -1.83%, +0.64%
Cycle count: 51365416 -> 44065285 (-14.21%); split: -14.22%, +0.01%
Spill count: 4186 -> 4429 (+5.81%); split: -4.90%, +10.70%
Fill count: 16356 -> 15263 (-6.68%); split: -10.50%, +3.82%
Max live registers: 95115 -> 94991 (-0.13%)
Non SSA regs after NIR: 180797 -> 180798 (+0.00%)
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34905>
All of the current instances of writing info values from `Vec`s involve
building an iterator and then collecting it specifically for writing.
By using an `ExactSizeIterator`, we can avoid the need for allocating in
these cases.
v2: use existing `CLInfoValue::write_iter()` instead of custom type
Reviewed-by: @LingMan
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34389>
Making subgroup sizes an iterator avoids collecting (and thus
allocation) in cases where the values are unneeded or only the first is
needed.
v2: fix calculation of `SetBitIndices<u32>` iterator length
Reviewed-by: @LingMan
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34389>
Iterating arrays and collecting to a `Vec` requires allocating memory
for the `Vec` and, when the needed result is an array of the same size
as the original, an unnecessary fallible conversion back to an array.
While arrays have a `map()` method for infallible conversions,
`try_map()` is unstable. Fortunately, we only have to worry about one
array size and it's small, so hand-mapping is a viable alternative.
Reviewed-by: @LingMan
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34389>
info->arg_sizes and info->strings were leaked because they were
allocated in the global context.
Fixes: 007f60c8b8 ("util/u_printf: add singleton implementation")
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34987>
On Android, both the ANB and AHB extensions support still requires
renderer external memory support along with format modifier and foreign
queue support.
On Linux, however, renderer external memory support alone is sufficient
to expose external memory extensions. This also helps not to force sw
wsi device when renderer has external memory support.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34830>
Use the now AArch64 optimized util_streaming_load_memcpy() routine to
copy the AFBC superblocks from non-cacheable to cacheable memory.
Signed-off-by: Loïc Molinari <loic.molinari@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34606>