We enable VK_EXT_pipeline_protected_access only if
anv_physical_device::has_protected_contexts. Therefore we should do the
same for vk_features::pipelineProtectedAccess.
Fixes: 0b5408f ("anv: expose VK_EXT_pipeline_protected_access")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32206>
(cherry picked from commit 56116c4da5)
This fixes a corner case of the LNL sub-dword integer restrictions
that wasn't being detected by has_subdword_integer_region_restriction(),
specifically:
> if(Src.Type==Byte && Dst.Type==Byte && Dst.Stride==1 && W!=2) {
> // ...
> if(Src.Stride == 2) && (Src.UniformStride) && (Dst.SubReg%32 == Src.SubReg/2 ) { Allowed }
> // ...
> }
All the other restrictions that require agreement between the SubReg
number of source and destination only affect sources with a stride
greater than a dword, which is why
has_subdword_integer_region_restriction() was returning false except
when "byte_stride(srcs[i]) >= 4" evaluated to true, but as implied by
the pseudocode above, in the particular case of a packed byte
destination, the restriction applies for source strides as narrow as
2B.
The form of the equation that relates the subreg numbers is consistent
with the existing calculations in brw_fs_lower_regioning (see
required_src_byte_offset()), we just need to enable lowering for this
corner case, and change lower_dst_region() to call lower_instruction()
recursively, since some of the cases where we break this restriction
are copy instructions introduced by brw_fs_lower_regioning() itself
trying to lower other instructions with byte destinations.
This fixes some Vulkan CTS test-cases that were hitting these
restrictions with byte data types.
Fixes: 217d412360 ("intel/fs/gfx20+: Implement sub-dword integer regioning restrictions.")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32261>
We were accidentally doing a signed integer comparison here for ult32,
or a sign-extending shift for ushr.
One notable bit of fallout was that load_global_uniform_block_intel
address calculations broke on platforms that don't have native 64-bit
integer support, as the iadd64 lowering for "do I need to carry?" was
using ult32...and performing the wrong comparison. We spotted this in
Borderlands 3 on Alchemist once we turned on other optimizations.
Thanks to Lionel Landwerlin for helping spot the problem!
Fixes: c7b312ad45 ("brw: factor out source extraction for rematerialization")
Fixes: 339630ab05 ("brw: enable A64 loads source rematerialization")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 5848035443)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32261>
Since the shader parameters are passed as inline data, push constants
are no longer used and so, not actually set on dispatch. But the
nr_params = 4 was still making the shader emit the code to load them,
causing page faults on simulation, and would also on HW if we didn't
always have a scratch page set.
The uses_inline_data parameter will be set from brw_compile_cs(), called
shortly after this point, so we don't need it here.
The subgroup_size is misleading, as we don't actually require that size
and the code that checks for it isn't even running for this shader.
Fixes: 97b17aa0b1 ("brw/nir: rework inline_data_intel to work with compute")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12152
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit d32a26b3e6)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32261>
All of the spilling code should work with physical register units
because for example SEND messages will expect a physical register as
destination.
So always allocate a full physical register for the spilled/unspilled
values and adjust the offsets of the registers to physical sizes too.
Cc: mesa-stable
Fixes: aa494cba ("brw: align spilling offsets to physical register sizes")
Closes: mesa/mesa#11967
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Found-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit a21cd8c5b6)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32261>
Some limits got stuck to the old binding table limits. Those don't
apply anymore since EXT_descriptor_indexing was implemented.
Fixes: 6e230d7607 ("anv: Implement VK_EXT_descriptor_indexing")
Fixes: 96c33fb027 ("anv: enable direct descriptors on platforms with extended bindless offset")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
(cherry picked from commit d6acb56f11)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32119>
Otherwise anv_descriptor_set is accessed through an unaligned pointer,
which is undefined behavior in C.
```
anv_descriptor_set.c:1620:17: runtime error: member access within misaligned address 0x61900002c2b5
for type 'struct anv_descriptor_set', which requires 8 byte alignment 0x61900002c2b5
```
Fixes: 2570a58bcd ("anv: Implement descriptor pools")
(cherry picked from commit a2c4a34303)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32119>
While the documentation says to use NUM_SIMD_LANES_PER_DSS for the stack
address calculation, what the HW actually uses is
NUM_SYNC_STACKID_PER_DSS. The former may vary depending on the platform,
while the latter is fixed to 2048 for all current platforms.
Fixes: 6c84cbd8c9 ("intel/dev/xe: Set max_eus_per_subslice using topology query")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit aee04bf4fb)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32119>
Copy propagation would incorrectly occur in this code
mov(16) v4+2.0:UW, u0<0>:UW NoMask
...
mov(8) v6+2.0:UD, v4+2.0:UD NoMask group0
to create
mov(16) v4+2.0:UW, u0<0>:UW NoMask
...
mov(8) v6+2.0:UD, u0<0>:UD NoMask group0
This has different behavior. I think I just made a mistake when I
changed this condition in e3f502e007.
It seems like this condition could be relaxed to cover cases like (note
the change of destination stride)
mov(16) v4+2.0<2>:UW, u0<0>:UW NoMask
...
mov(8) v6+2.0:UD, v4+2.0:UD NoMask group0
I'm not sure it's worth it.
No shader-db or fossil-db changes on any Intel platform. Even the code
for the test case mentioned in the original commit did not change.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: e3f502e007 ("intel/fs: Allow copy propagation between MOVs of mixed sizes")
Closes: #12116
(cherry picked from commit 80a5d158ae)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32119>
Specifically, allow two immediate sources for BFE on Gfx12+. I stumbled
on this while trying some stuff with !31852.
v2: Don't be lazy. Add proper assertions for all the things on all the
platforms. Based on a suggestion by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: 7bed11fbde ("intel/brw: Allow immediates in the BFE instruction on Gfx12+")
(cherry picked from commit c1c09e3c4a)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32119>
Corrects regression caused by prior commit that created memory
overwrite by not mallocing enough space for filename string.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32013>
This has the best fossil-db results across in a sweep from 0..15.
fossil-db results on Alderlake:
Instructions in all programs: 152849904 -> 152824116 (-0.0%)
SENDs in all programs: 7677830 -> 7677830 (+0.0%)
Loops in all programs: 48470 -> 48470 (+0.0%)
Cycles in all programs: 11988670382 -> 11987530942 (-0.0%)
Spills in all programs: 42863 -> 41777 (-2.5%)
Fills in all programs: 77114 -> 73044 (-5.3%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31990>
A while back Matt enabled shader_spilling_rate by default for anv.
But intel_clc doesn't use the driconf mechanism that we use there.
The GRL shaders spill a lot, and with us now compiling additional
generations of the shaders, Mesa build time is getting prohibitively
expensive. By setting this, we drop the time taken for a clean debug
build by approximately 35% on my current laptop.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31993>
Now actually making use of new Xe KMD OA syncronization uAPI.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31283>
Xe KMD added a uAPI to syncronze metrics id changes, so we can make
it wait for all previous workloads in exec_queue and all previous
metrics id changes to finish before start change it again.
This should make Vulkan queries more robust.
So this makes use of intel_bind_timeline to syncronize the metrics id
changes and xe_queue_get_syncobj_for_idle() to syncronize with
exec_queue.
As i915 and some versions of Xe KMD will not support it, this feature
will only be used then intel_bind_timeline parameter is not NULL and
timeline has a valid syncobj id.
At this patch level all callers will set it to NULL, next patch will
add and initialize timeline in ANV when supported by Xe KMD.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31283>
ARL has enough differences in OA files to have its own set of files.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31685>
We emit it for gfx8, so the assembler should support it too.
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31747>
Previously this bit was not clearly documented in PRMs, but gfx12 PRMs
finally list all the instructions where it is present.
Although it's unclear if it's functional for anything other than "if",
"else", and "goto", we probably still should acknowledge its existence
in other instructions.
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31747>
We emit it for gfx9, so the assembler should support it too.
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31747>
Previously this bit was not clearly documented in PRMs, but gfx12 PRMs
finally list all the instructions where it is present.
Although it's unclear if it's functional for anything other than "if",
"else", and "goto", we probably still should acknowledge its existence
in other instructions.
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31747>
This is a workaround that is still in progress, see HSD 22020521218.
If we don't have these NOPs we may see rendering corruption or even
GPU hangs.
While we still don't fully understand the issue from the hardware
point of view, let's have this workaround so we can pass CTS and move
things forward. If we need to change this later, we can. Besides, the
impact is minimal. Shaderdb/fossilize report no changes for this
patch.
On our Blackops trace, the lack of this patch causes corruption in fog
rendering (rectangles where fog was supposed to be shown don't show
the fog).
On dEQP-VK.graphicsfuzz.cov-array-copies-loops-with-limiters, without
this patch we get a GPU hang.
Backport-to: 24.2
Testcase: dEQP-VK.graphicsfuzz.cov-array-copies-loops-with-limiters
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11813
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31331>
Rather than updating intel_device_info_update_l3_banks(), the Xe KMD
provides this info via the DRM_XE_DEVICE_QUERY_GT_TOPOLOGY query item.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31894>
This bit is not needed for barriers and appears to trigger a
performance regression. So leave it for just for AUX-TT
flushing/invalidation.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e3814dee1a ("anv: add plumbing/support for L3 fabric flush")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12090
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31915>
Stuff COMPUTE_WALKER_BODY in COMPUTER_WALKER in both iris and anv.
This also fixes the tracepoint for ray dispatches. Stuffing
COMPUTE_WALKER_BODY allow us to set the
cmd_buffer->state.last_compute_walker.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31822>
intel_device_info_update_after_hwconfig() updates max_cs_threads
based on max_eus_per_subslice and num_thread_per_eu but in some
platforms simulator the hwconfig don't have the
INTEL_HWCONFIG_MAX_NUM_EU_PER_DSS value, causing max_cs_threads to
be set to a wrong value and then causing issues when programing
CFE_STATE with a invalid value.
Fortunately we can also get max_eus_per_subslice from topology query,
so here moving the hwconfig query and
intel_device_info_update_after_hwconfig() call to after topology.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31850>
If a render area covers an area that is smaller than an attachment's
extent and is not aligned to the CCS block size, we must load the clear
color so that the pixels outside of that area are decompressed with the
right clear color.
Prevents the next patch from causing the following test failure on gfx9:
dEQP-VK.renderpass.suballocation.load_store_op_none.color_load_op_none_store_op_none
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31743>
Store an array of clear values, one for each view format of the image.
Load the clear value based on the view format.
anv_image_msaa_resolve() may override the source or destination with
ISL_FORMAT_UNSUPPORTED, so make anv_image_get_clear_color_addr() handle
that format.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31743>
In later commits, we'll rely on the number of view formats used by an
image to determine the size allocated for an array of clear colors in
the aux-state tracking buffer. Having a single view format for dmabufs
with clear color support allows anv to transparently handle this case.
Restrict the number of view formats by explicitly setting the image
format list to incomplete. Secondly, loosen the non-zero clear color
restriction on clear color supporting dmabufs. Those images can support
any clear color even with an incomplete list because we restrict
problematic accesses for the clear color during the negotiation phase.
Lastly, update add_all_surfaces_explicit_layout() to assert that the
sizing of the imported clear color struct meets expectations.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31743>
This was a VKCTS bug on earlier version of the CTS.
These tests have been actually passing since the VKCTS was uprevved to
1.3.9.0, which landed a bit before ADL testing in CI was turned on.
Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31862>
There's currently no GL or GLES testing on the iris gallium driver,
and the VKCTS expectations were erroneously listed under iris-*.txt.
Fix the rules set for anv-adl-full, change the GPU_VERSION to anv-adl
and move the expectations around accordingly.
Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31862>
The InlineData passed to the shader is a fixed size unrelated to the
register size. It happens to match pre-Xe2, but by considering it the
same in Xe2, we ended up reading pushed constants from the wrong place
when they didn't fit in the InlineData.
Fixes: 97b17aa0b1 ("brw/nir: rework inline_data_intel to work with compute")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31856>