Things are stable enough now to increase the number tests per group,
which allows us to lower the DEQP_FRACTION to 3. This also allows
us to lower the 'parallel' property to 4 leaving one extra board for
other jobs to run.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38923>
dEQP-VK.image.general_layout.memory_barrier.fragment.write_read.* are
passing sometimes, which causes UnexpectedImprovement(Pass) to show up,
but the bug still exists.
Add those to the flake list until this is fully sorted out.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38923>
If we don't do that and something fails in the middle, we leak
the decode context.
Fixes: d155d6b7a3 ("panvk: Add a decode context at the panvk_device level")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38923>
cs_finish() is doing two things:
1. wrapping up the CS to prepare for its execution
2. freeing the temporary instrs array and maybe_ctx allocations
Mixing those two things lead to confusion and leaks, so let's split
those into cs_end() and cs_builder_fini(), and make sure panvk/panfrost
call both when appropriate.
Fixes: 50d2396b7e ("pan/cs: add helpers to emit contiguous csf code blocks")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38923>
own_bin needs to be set to true if we want the bin_ptr to be freed.
Fixes: 3d2cc01f8a ("panvk: Add create_shader_from_binary")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38923>
s/util_dynarray_clear/util_dynarray_fini/ to fix the leak.
Fixes: 7dc4f28507 ("pan/bi: schedule simple iterators to avoid extra move")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38923>
The desc_heap field is unconditionally initialized, so we need to
call util_vma_heap_finish() on it.
Fixes: ec02137c86 ("panvk: Support DESCRIPTOR_POOL_CREATE_HOST_ONLY_BIT")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38923>
VK_EXT_multisampled_render_to_single_sampled needs those to be able
to render to the MS attachment when the app only provides a single-
sampled one.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38825>
Couldn't find in the docs a reference for the types needing to match,
and simulator + MTL seem fine with mixing UD and UW, so not adding
a replacement for the removed assertions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38877>
Most stages call this as part of brw_nir_postprocess_opts() but mesh
lowers to URB intrinsics after that since it needs bit-sizes lowered.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
We don't bother with maximums or wrapping because it shouldn't come up
for IO intrinsics anyway.
fossil-db results on Battlemage:
Instrs: 231363032 -> 231359554 (-0.00%)
Cycle count: 34057005552.0 -> 34057236190.0 (+0.00%); split: -0.00%, +0.00%
Max live registers: 71873886 -> 71870438 (-0.00%)
Non SSA regs after NIR: 67159408 -> 67159523 (+0.00%)
Totals from 1779 (0.23% of 788851) affected shaders:
Instrs: 774359 -> 770881 (-0.45%)
Cycle count: 10551280.0 -> 10781918.0 (+2.19%); split: -0.32%, +2.51%
Max live registers: 158193 -> 154745 (-2.18%)
Non SSA regs after NIR: 180104 -> 180219 (+0.06%)
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
We keep this separate from the other lowering infrastructure because
there's no semantic IO involved here, just byte offsets. Also, it needs
to run after nir_lower_mem_access_bit_sizes, which means it needs to be
run from brw_postprocess_opts. But we can't do the mesh URB lowering
there because that doesn't have the MUE map.
It's not that much code as a separate pass, though.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
With all the infrastructure in place, this is largely a matter of
calling the lowering passes with the appropriate data from the MUE map.
MUE initialization is now done with semantic IO instead of raw offsets.
This drops another case of non-standard NIR IO usage (and no_validate).
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
(Split by Ken from a larger patch originally written by Lionel.)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
(Based on the original implementation by Lionel Landwerlin, but adapted
to my respun URB lowering framework.)
The mesh shader URB payload requires reading and writing fields at
arbitrary DWord offsets. For example, the Primitive Indices array
starts at DWord 1, and it can be a vec1[], vec2[], or vec3[] array,
leading to very unaligned and sometimes double-parked elements.
Still, most fields are still conveniently vec4-aligned.
To handle this, we add a new cb_data::vec4_access flag. If set, access
remains in vec4 units, with vec4 alignment. We use this for non-mesh
stages. When unset, offset is in 32-bit units, allowing unaligned
DWord access.
This is trivial to support on Xe2, where the LSC URB messages support
arbitrary byte-aligned addressing. On older platforms, we have to
convert this to vec4 aligned offsets plus a component offset (either
returning a subset of the channels loaded, or using component masking
to store a subset of a vec4/vec8).
Thankfully, since the OWord URB messages support accessing a vec8 at
a time, this means we can do any vec4 access in one message, even if
it's double-parked. We use mod-analysis to see if we can statically
determine the sub-vec4 component offset required (we often can). If
not, we use the ability to have dynamic writemasks to sort it out.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
When considering ((x << y) % divisor), we recursed to calculate
mod = (x % (divisor << y)) but incorrectly returned mod directly,
rather than the correct value, (mod << y).
(Note that we require divisor to be a power-of-two.)
As an example of this going wrong, (x << 1) % 4 was returning (x % 2)
which is 0 or 1, but x << 1 is 2x, which is always an even number so
the result mod 4 can only be 0 or 2.
Unit test suggested by Caio Oliveira during review.
Fixes: 2255375c4d ("nir: add nir_mod_analysis & its tests")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
I forgot to copy this over in the LSC case. This meant we were missing
reorderability which meant that we were missing out on CSE.
fossil-db results on Battlemage:
Instrs: 231471427 -> 231363032 (-0.05%)
Send messages: 12077759 -> 12019628 (-0.48%)
Cycle count: 34058451430.0 -> 34057005552.0 (-0.00%); split: -0.01%, +0.00%
Spill count: 520387 -> 520135 (-0.05%)
Fill count: 470812 -> 470722 (-0.02%)
Max live registers: 72111834 -> 71873886 (-0.33%)
Totals from 2898 (0.37% of 788851) affected shaders:
Instrs: 1223836 -> 1115441 (-8.86%)
Send messages: 148633 -> 90502 (-39.11%)
Cycle count: 17732554.0 -> 16286676.0 (-8.15%); split: -10.65%, +2.49%
Spill count: 252 -> 0 (-inf%)
Fill count: 90 -> 0 (-inf%)
Max live registers: 491684 -> 253736 (-48.39%)
Non SSA regs after NIR: 255397 -> 255402 (+0.00%)
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
This now lowers IO intrinsics to URB intrinsics in a single step,
rather than modifying IO intrinsics to have non-standard meanings
temporarily. We are able to drop one "no_validate" flag.
For example, remap_patch_urb_offsets had added (vertex * stride) to
(offset) for per-vertex IO intrinsics, but left them as per-vertex
intrinsics. Now we just have an urb_offset() function to calculate
that when doing the lowering.
This also provides a central location for calculating URB offsets,
which we should be able to extend for other uses (per-view lowering,
mesh per-primitive lowering) in future patches.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38918>
Fixes the following building error happening with clang:
FAILED: src/amd/vulkan/libvulkan_radeon.so.p/nir_radv_nir_rt_traversal_shader.c.o
...
../src/amd/vulkan/nir/radv_nir_rt_traversal_shader.c:1159:49: error: use of GNU empty initializer extension [-Werror,-Wgnu-empty-initializer]
struct radv_nir_rt_traversal_params params = {};
^
1 error generated.
Fixes: f692ac76 ("radv/rt: Use traversal vars for object origin/direction in ahit/isec")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38954>
GFX10 hangs when drawing from a 0-sized index buffer.
GFX6 has a HW bug when the index buffer address is 0.
Looking at VK CTS runs, GFX6 still triggers VM faults despite the
current mitigation, and it also tries to access memory when the
index buffer is zero sized. So it looks like GFX6 and GFX10
really have the same bug.
Let's share the mitigation between the two.
Use a zero-filled BO instead of the upload buffer.
This fixes VM faults on GFX6, and should speed up GFX10 a bit.
Note that the zero-filled BO is also going to be used for
other bug mitigations on GFX6-7.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38958>
SGPR offset is not included in the bounds check
according to the ISA documentation of GFX6-7 and
indeed it can trigger VM faults on OOB access.
Note that ACO already doesn't use the SGPR offset
on GFX6-7 for buffer loads and stores. This commit
just does the same for buffer atomics.
This commit mitigates a ton of VM faults that are exposed by:
24e75fea4b
Fossil DB stats on Hawaii (GFX7):
Totals from 148 (0.24% of 61818) affected shaders:
Instrs: 324004 -> 327352 (+1.03%)
CodeSize: 1556468 -> 1514100 (-2.72%); split: -2.74%, +0.02%
Latency: 1271480 -> 1276894 (+0.43%)
InvThroughput: 396850 -> 397740 (+0.22%)
VClause: 6861 -> 6858 (-0.04%)
Copies: 34083 -> 37430 (+9.82%)
PreVGPRs: 5705 -> 5706 (+0.02%)
VALU: 147529 -> 150898 (+2.28%)
SALU: 98194 -> 98172 (-0.02%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38958>
While the concept of "VRAM" is somewhat nebulous on SVGA devices this is
the value above which some performance degradation is likely to occur.
Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38818>
We should not manipulate the session buffers at command recording time.
It shouldn't cause any problems as these initialized probability tables
are not modified by firmware, but moving these to bind time should be
safer and also faster if an application frequently RESETs.
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38926>
This block of code will be re-used in a future patch, also it reduces a bit the
size and complexity of lower_sampler_logical_send().
No changes in behavior intended here.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38792>