HF sources to math instructions cannot be scalar. This is very similar
to an old Gfx6 restriction on POW, so let's fix it in a similar way.
As an extra bit of saftey, lower any occurances that might slip through
in brw_fs_lower_regioning.
The primary change is to prevent copy propagation from violating the
restriction. With that change, nothing should be able to generate these
invalid source strides. The modification to fs_visitor::validate should
detect potential problems sooner rather than later.
Previous attempts to implement this Wa when emitting the math
instruction (in brw_eu_emit.c gfx6_math) didn't work for several
reasons. The lowering happens after the SWSB pass, so the scoreboarding
was incorrect (thanks to Curro for finding that). In addition, the
lowering happens after register allocation, so it's impossible to
allocate a non-scalar register to expand the scalar value.
Fixes 113 tests in the dEQP-VK.spirv_assembly.* group on LNL.
v2: Add changes to brw_fs_lower_regioning. Suggested by Curro.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28480>
This previously worked by luck because we were incorrectly calling
nir_opt_remove_phis before calling nir_convert_from_ssa. See also #10727.
No shader-db or fossil-db changes on any Intel platform.
v2: Handle the load_interpolated_input and load_barycentric_at_sample as
separate passes. Based on discussion with Ken starting at
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28136#note_2330424.
Fixes: 74a40cc4b6 ("intel/fs: move lower of non-uniform at_sample barycentric to NIR")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28136>
See HSD 22015614752. We have issues when multiple engines access the
same CCS cacheline in parallel. This can happen in a Vulkan application
that uses different queues to operate on different subresources.
To resolve this, this patch prefers Tile64 when an image has multiple
subresources and disallows CCS if such an image lacks that tiling.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8614
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28284>
WA 22015614752 applies to gfx125 platforms, but the alignment
requirement was only enabled for the subset that has an aux-map. Adjust
the condition to apply it where appropriate.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28284>
We guard surface state encoding of tilings by macros when the encoded
value is not present on certain platforms. For gfx20 however, we added
these macros even when the existing ones for gfx125 were sufficient.
Remove the extra macros.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28284>
We don't check the gfx version when choosing the tiling except when
choosing Tile64. Drop the version check for consistency and to remove
doubts about the order of operations occuring as expected within the
CHOOSE macro.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28284>
This maches specification and better matches the gfx 125 definition.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28505>
It was using the wrong platform definition that only had 1 bit,
filtering by DG2/ACM it shows the correct definition.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28505>
Justing setting all zeroes to STATE_COMPUTE_MODE will do nothing,
the mask of each register must be set for it to change.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28505>
Earlier strategy was to enable always on DG2 but there has been bunch of
issues that indicate this feature is not working correctly. Disable
until we figure out issues with it.
Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28184>
Reviewer the results from the last nightly run completed using ci-collate tool
(gl.fd.o/gfx-ci/ci-collate) with the 'patch' feature and a bit of human
intervention, these are the changes in the expectations.
Signed-off-by: Sergi Blanch Torne <sergi.blanch.torne@collabora.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28350>
align16 support is only used on Gen9 for 3-source instructions, quad
swizzling, and dPdy calculations. We don't need it for broadcast.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28458>
brw_broadcast and generate_mov_indirect both had similar comments, both
with typos ("insead"). One still referred to IVB bugs, while the other
dropped that during the compiler split. The one that dropped the
comment mentioned "both of these" issues, while citing only one issue;
there was in fact a third issue (no-Q/UQ) that wasn't mentioned in
either comment. One also had some bad grammar in the comments.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28458>
For BROADCAST and MOV_INDIRECT, required_exec_type was returning
brw_int_type(type_sz(t), false), which is an unsigned type. However,
get_exec_type(inst) returns the original type for either Q or UQ.
This meant that has_invalid_exec_type would detect a mismatch and
trigger lowering.
That lowering would insert new 64-bit MOVs, which would need to be
lowered on platforms which don't support Q/UQ. Except, we already
ran that lowering pass earlier. So, the unlowered Q/UQ MOVs would
reach the software scoreboarding pass, and trigger failures in the
inferred_exec_pipe() function, as no pipe is available to handle
64-bit integer operations.
It turns out that we don't need the region lowering pass to do
anything for these opcodes. The generator code for both BROADCAST
and MOV_INDIRECT already handle decomposing Q/UQ operations into
32-bit MOVs when they're not supported. And, it also implicitly
converts to integer types, even for floating point sources. The
inferred_exec_pipe function already special cases them to note
that they'll always be handled on the integer pipe, so that matches.
Just drop the region lowering code for these opcodes.
Cc: mesa-stable
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28458>
We are overriding the type to Q/UQ, so we need to split to two MOVs
if 64-bit integer math is not supported. For reference, Meteorlake
does support 64-bit floats but would still not work correctly here.
See also brw_broadcast(), which does similar indirects but correctly
checks has_64bit_int instead of has_64bit_float.
Cc: mesa-stable
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28458>
The vk_sync_wait() function is already capable of returning some nice
VkResult errors, don't lose information by replacing everything with
vk_queue_set_lost.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28455>
==134077== 96 bytes in 1 blocks are definitely lost in loss record 1 of 3
==134077== at 0x4840808: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==134077== by 0x6D6F690: vk_default_alloc (vk_alloc.c:26)
==134077== by 0x52EEEBE: vk_alloc (vk_alloc.h:48)
==134077== by 0x52EEEEE: vk_zalloc (vk_alloc.h:56)
==134077== by 0x52EF47E: xe_exec_process_syncs (anv_batch_chain.c:132)
==134077== by 0x52EF8F6: xe_execute_trtt_batch (anv_batch_chain.c:215)
==134077== by 0x5301670: anv_queue_submit_trtt_batch (anv_batch_chain.c:1697)
==134077== by 0x603D135: gfx125_write_trtt_entries (genX_cmd_buffer.c:6091)
==134077== by 0x5370B44: anv_sparse_bind_trtt (anv_sparse.c:595)
==134077== by 0x5370CFC: anv_sparse_bind (anv_sparse.c:629)
==134077== by 0x5370E6E: anv_init_sparse_bindings (anv_sparse.c:670)
==134077== by 0x5328037: anv_CreateBuffer (anv_device.c:5071)
Note to backporters: this is only for when xe.ko is being used and
ANV_SPARSE_USE_TRTT=1 is exported. This is not the regular code path.
Fixes: 18bd00c024 ("anv/trtt: don't wait/signal syncobjs using the CPU anymore")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28455>
Rework:
* Francisco Jerez: Rebase on 07b9bfacc7 ("intel/compiler: Move
logical-send lowering to a separate file")
* Jordan: Move SHADER_OPCODE_DWORD_SCATTERED_*_LOGICAL from previous
patch, as it seems to make more sense here.
* Jordan: Change `devinfo->has_lsc` ?: to if/else as suggested by idr
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>
Reworks:
* Francisco: Rebase on 07b9bfacc7 ("intel/compiler: Move
logical-send lowering to a separate file")
* Jordan: Rebase on 952a523abb ("intel: switch over to unified
atomics")
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>
These fields are overlapping with the ones set by brw_message_desc(),
so the latter should be used instead. This fixes corruption of the
LSC message descriptors when inconsistent values are specified through
both helpers, which can happen if the 'inst->mlen' field is modified
during optimization (e.g. by opt_split_sends()).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>
We cannot rely on the immediate message descriptor having accurate
values for mlen and rlen at the IR level, since they are updated at
codegen time via 'inst->mlen' and 'inst->size_written', which could
end up with values inconsistent with the message descriptor if
e.g. the split sends optimization had an effect. Instead, define
helpers that do the computation without relying on the message
descriptor, and use the pre-existing
brw_message_desc_mlen()/brw_message_desc_rlen() helpers (fully
equivalent to the lsc helpers deleted here) during disassembly.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>
Rework:
* Jordan: Drop FINISHME (s-b Caio)
* Jordan: Use reg_unit() in asserts rather than a ver check (s-b Caio)
* Ian: Make use of reg_unit() in round_components_to_whole_registers()
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>
In the common case, fs_inst will have up to 4 sources (the HW
instructions have up to 3, and our representation of SENDs have 4).
Embed such array into the fs_inst, and use it whenever applicable
instead of allocating a new array.
Also change the code to reuse the allocated src array when resizing to
a smaller length.
Between the changes above and the reduced amount of initializing
fs_regs, this reduces fossil-db time by around 2% for Borderlands 3
and Rise of the Tomb Raider, and around 1.5% for Total War Warhammer 3.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28379>
It was by pure luck that all sources (and the result) of nir_dpas_intel
had the same number of components. It is possible to support matrix
sizes where the accumlator matrix and the result matrix are larger
(e.g., 16x8 * 8x16 = 16x16).
This breaks all of the assumptions of NIR's infrastructure for code
generating intrinsics. Fix the by making the accumulator matrix be the
first source. The accumulator and the result will always have the same
dimensions (due to rules of matrix multiplication) and the same type
(due to restructions of the cooperative matrix extension). This forces
them to have the same number of components.
This doesn't fix all the potential problems. NIR expects that all
0-sized sources will have the same number of components. This just
ensures that the result has the correct number of components.
Fixes: 6b14da33ad ("intel/fs: nir: Add nir_intrinsic_dpas_intel")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28404>
Was previously passing 1, 1, 0 as the regioning. This generated
incorrect disassembly because the encoding for a width of 1 is 0. Use
the enums to ensure the correct values are used.
Fixes: 1c92dad5cb ("intel/disasm: Disassembly support for DPAS")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28404>
If the destination was the accumulator but is no longer, having the flag
set is not correct. On Xe2 this also causes a validation error.
v2: Reword the comment to be more clear. Suggested by Jordan.
Fixes: efa4e4bc5f ("intel/fs: Introduce regioning lowering pass.")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28404>