Commit graph

12020 commits

Author SHA1 Message Date
Lionel Landwerlin
07bf480856 anv: fix protected memory allocations
Using the wrong flag field...

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 5f2c77a10a ("anv: handle protected memory allocation")
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Tested-by: Mark Janes <markjanes@swizzler.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26540>
2024-04-05 09:07:21 +03:00
Ian Romanick
0e817ba548 intel/brw/xe2+: Implement Wa 22016140776
HF sources to math instructions cannot be scalar. This is very similar
to an old Gfx6 restriction on POW, so let's fix it in a similar way.

As an extra bit of saftey, lower any occurances that might slip through
in brw_fs_lower_regioning.

The primary change is to prevent copy propagation from violating the
restriction. With that change, nothing should be able to generate these
invalid source strides. The modification to fs_visitor::validate should
detect potential problems sooner rather than later.

Previous attempts to implement this Wa when emitting the math
instruction (in brw_eu_emit.c gfx6_math) didn't work for several
reasons. The lowering happens after the SWSB pass, so the scoreboarding
was incorrect (thanks to Curro for finding that). In addition, the
lowering happens after register allocation, so it's impossible to
allocate a non-scalar register to expand the scalar value.

Fixes 113 tests in the dEQP-VK.spirv_assembly.* group on LNL.

v2: Add changes to brw_fs_lower_regioning. Suggested by Curro.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28480>
2024-04-04 21:04:09 -07:00
Jordan Justen
50c7d25a9e intel/dev/mesa_defs.json: Add LNL WA entries
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28480>
2024-04-04 21:03:51 -07:00
Ian Romanick
0b67d3d909 intel/elk: Delete stray nir_opt_dce
No shader-db changes on any Intel platform.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28136>
2024-04-04 23:42:28 +00:00
Ian Romanick
24cdbbdaa2 intel/brw: Delete stray nir_opt_dce
No shader-db or fossil-db changes on any Intel platform.

Fixes: f76f4be301 ("intel/compiler: move gen5 final pass to actually be final pass")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28136>
2024-04-04 23:42:27 +00:00
Ian Romanick
44fb57b827 intel/elk: Don't call nir_opt_remove_phis before nir_convert_from_ssa
shader-db:

All platforms had similar results. (Ivy Bridge shown)
total instructions in shared programs: 15831424 -> 15831637 (<.01%)
instructions in affected programs: 38880 -> 39093 (0.55%)
helped: 0 / HURT: 179

total cycles in shared programs: 432140353 -> 432170199 (<.01%)
cycles in affected programs: 11798080 -> 11827926 (0.25%)
helped: 77 / HURT: 123

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28136>
2024-04-04 23:42:27 +00:00
Ian Romanick
6377e8fd29 intel/brw: Don't call nir_opt_remove_phis before nir_convert_from_ssa
Per discussion in #10727, removing phis breaks LCSSA form which in turn
invalidates divergence analysis.

shader-db:

All Skylake and newer platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 20299612 -> 20299695 (<.01%)
instructions in affected programs: 20829 -> 20912 (0.40%)
helped: 6 / HURT: 13

total cycles in shared programs: 842149085 -> 842148399 (<.01%)
cycles in affected programs: 15146222 -> 15145536 (<.01%)
helped: 40 / HURT: 45

fossil-db:

All Intel platforms had similar results. (Ice Lake shown)
Totals:
Instrs: 165505077 -> 165505603 (+0.00%); split: -0.00%, +0.00%
Cycles: 15144183575 -> 15144235695 (+0.00%); split: -0.00%, +0.00%
Spill count: 45213 -> 45220 (+0.02%)
Fill count: 74166 -> 74184 (+0.02%)

Totals from 94 (0.01% of 656116) affected shaders:
Instrs: 263079 -> 263605 (+0.20%); split: -0.00%, +0.20%
Cycles: 28411487 -> 28463607 (+0.18%); split: -0.18%, +0.37%
Spill count: 3474 -> 3481 (+0.20%)
Fill count: 6713 -> 6731 (+0.27%)

Fixes: 6dbb5f1e07 ("intel/fs: rerun divergence analysis prior to convert_from_ssa")
Closes: #10727
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28136>
2024-04-04 23:42:27 +00:00
Ian Romanick
87101e7d83 intel/compiler: Ensure load_barycentric_at_sample and load_interpolated_input remain together
This previously worked by luck because we were incorrectly calling
nir_opt_remove_phis before calling nir_convert_from_ssa. See also #10727.

No shader-db or fossil-db changes on any Intel platform.

v2: Handle the load_interpolated_input and load_barycentric_at_sample as
separate passes. Based on discussion with Ken starting at
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28136#note_2330424.

Fixes: 74a40cc4b6 ("intel/fs: move lower of non-uniform at_sample barycentric to NIR")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28136>
2024-04-04 23:42:27 +00:00
Nanley Chery
c6686fda28 intel/isl: Use Tile64 to align images for CCS WA
See HSD 22015614752. We have issues when multiple engines access the
same CCS cacheline in parallel. This can happen in a Vulkan application
that uses different queues to operate on different subresources.

To resolve this, this patch prefers Tile64 when an image has multiple
subresources and disallows CCS if such an image lacks that tiling.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8614
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28284>
2024-04-04 15:17:50 +00:00
Nanley Chery
b092124186 intel/isl: Enable a 64KB alignment WA for flat-CCS
WA 22015614752 applies to gfx125 platforms, but the alignment
requirement was only enabled for the subset that has an aux-map. Adjust
the condition to apply it where appropriate.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28284>
2024-04-04 15:17:50 +00:00
Nanley Chery
d7bfa8051e intel/isl: Remove a CCS_D check from gfx12+ code
This aux usage isn't used on gfx12+.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28284>
2024-04-04 15:17:50 +00:00
Nanley Chery
8845f1e439 intel/isl: Remove inconsistency when encoding Tile64
We guard surface state encoding of tilings by macros when the encoded
value is not present on certain platforms. For gfx20 however, we added
these macros even when the existing ones for gfx125 were sufficient.
Remove the extra macros.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28284>
2024-04-04 15:17:50 +00:00
Nanley Chery
81d8c071ac intel/isl: Remove inconsistency when choosing Tile64
We don't check the gfx version when choosing the tiling except when
choosing Tile64. Drop the version check for consistency and to remove
doubts about the order of operations occuring as expected within the
CHOOSE macro.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28284>
2024-04-04 15:17:50 +00:00
Rohan Garg
57209a0c7a isl: allow CCS on single sampled TILE64 surfaces
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23030>
2024-04-04 02:17:34 +00:00
Rohan Garg
afb63443a0 intel/blorp: add fast clear rectangle dimensions for single sampled TILE64 CCS surfaces
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23030>
2024-04-04 02:17:34 +00:00
José Roberto de Souza
a47a65c1c2 intel/genxml/xe2: Update definition of INTERFACE_DESCRIPTOR_DATA
This maches specification and better matches the gfx 125 definition.

Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28505>
2024-04-03 20:21:04 +00:00
José Roberto de Souza
0f29b780e1 intel/genxml/gfx125: Fix definition of INTERFACE_DESCRIPTOR_DATA::Thread group dispatch size
It was using the wrong platform definition that only had 1 bit,
filtering by DG2/ACM it shows the correct definition.

Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28505>
2024-04-03 20:21:04 +00:00
José Roberto de Souza
c00c685f84 intel/genxml: Add more instdone registers
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28505>
2024-04-03 20:21:04 +00:00
José Roberto de Souza
2f3dc31876 anv: Set STATE_COMPUTE_MODE mask bit when zeroing compute mode
Justing setting all zeroes to STATE_COMPUTE_MODE will do nothing,
the mask of each register must be set for it to change.

Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28505>
2024-04-03 20:21:04 +00:00
Yonggang Luo
3114917986 util: Turn futex_wake parameter to int32_t for consistence across platforms
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28473>
2024-04-03 00:55:24 +00:00
Eric Engestrom
ff37f68740 meson: add VK_DRIVER_FILES to devenv, alongside the old VK_ICD_FILENAMES
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28516>
2024-04-02 18:08:52 +00:00
Eric Engestrom
96e8648b32 docs: replace references to the deprecated VK_INSTANCE_LAYERS with the new VK_LOADER_LAYERS_ENABLE
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28516>
2024-04-02 18:08:52 +00:00
Tapani Pälli
a87d888546 anv: disable fcv optimization on >= gfx125
Earlier strategy was to enable always on DG2 but there has been bunch of
issues that indicate this feature is not working correctly. Disable
until we figure out issues with it.

Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28184>
2024-04-02 09:28:18 +00:00
Sergi Blanch Torne
35a9e8577c ci: Nightly run expectations update
Reviewer the results from the last nightly run completed using ci-collate tool
(gl.fd.o/gfx-ci/ci-collate) with the 'patch' feature and a bit of human
intervention, these are the changes in the expectations.

Signed-off-by: Sergi Blanch Torne <sergi.blanch.torne@collabora.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28350>
2024-04-02 07:52:42 +00:00
Kenneth Graunke
9e0d0190ea intel/brw: Drop align16 support in brw_broadcast()
align16 support is only used on Gen9 for 3-source instructions, quad
swizzling, and dPdy calculations.  We don't need it for broadcast.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28458>
2024-04-02 00:00:59 +00:00
Kenneth Graunke
a520c976a5 intel/brw: Drop dead CHV checks.
This compiler no longer supports Cherryview.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28458>
2024-04-02 00:00:59 +00:00
Kenneth Graunke
e3d12cf72f intel/brw: Don't mention gfx7 limitations in shuffle comments
We don't support gfx7 here anymore, so we needn't consider it.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28458>
2024-04-02 00:00:59 +00:00
Kenneth Graunke
1d9e2b761a intel/brw: Update comments for indirect MOV splitting
brw_broadcast and generate_mov_indirect both had similar comments, both
with typos ("insead").  One still referred to IVB bugs, while the other
dropped that during the compiler split.  The one that dropped the
comment mentioned "both of these" issues, while citing only one issue;
there was in fact a third issue (no-Q/UQ) that wasn't mentioned in
either comment.  One also had some bad grammar in the comments.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28458>
2024-04-02 00:00:59 +00:00
Kenneth Graunke
7a24f29fbb intel/brw: Fix lower_regioning for BROADCAST, MOV_INDIRECT on Q types
For BROADCAST and MOV_INDIRECT, required_exec_type was returning
brw_int_type(type_sz(t), false), which is an unsigned type.  However,
get_exec_type(inst) returns the original type for either Q or UQ.
This meant that has_invalid_exec_type would detect a mismatch and
trigger lowering.

That lowering would insert new 64-bit MOVs, which would need to be
lowered on platforms which don't support Q/UQ.  Except, we already
ran that lowering pass earlier.  So, the unlowered Q/UQ MOVs would
reach the software scoreboarding pass, and trigger failures in the
inferred_exec_pipe() function, as no pipe is available to handle
64-bit integer operations.

It turns out that we don't need the region lowering pass to do
anything for these opcodes.  The generator code for both BROADCAST
and MOV_INDIRECT already handle decomposing Q/UQ operations into
32-bit MOVs when they're not supported.  And, it also implicitly
converts to integer types, even for floating point sources.  The
inferred_exec_pipe function already special cases them to note
that they'll always be handled on the integer pipe, so that matches.

Just drop the region lowering code for these opcodes.

Cc: mesa-stable
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28458>
2024-04-02 00:00:59 +00:00
Kenneth Graunke
a90edad9f7 intel/brw: Fix generate_mov_indirect to check has_64bit_int not float
We are overriding the type to Q/UQ, so we need to split to two MOVs
if 64-bit integer math is not supported.  For reference, Meteorlake
does support 64-bit floats but would still not work correctly here.

See also brw_broadcast(), which does similar indirects but correctly
checks has_64bit_int instead of has_64bit_float.

Cc: mesa-stable
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28458>
2024-04-02 00:00:59 +00:00
Paulo Zanoni
817f74748f anv/xe: don't overwrite the result from vk_sync_wait()
The vk_sync_wait() function is already capable of returning some nice
VkResult errors, don't lose information by replacing everything with
vk_queue_set_lost.

Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28455>
2024-04-01 23:36:12 +00:00
Paulo Zanoni
38af7254e2 anv/xe: don't leak xe_syncs during trtt submission
==134077== 96 bytes in 1 blocks are definitely lost in loss record 1 of 3
==134077==    at 0x4840808: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==134077==    by 0x6D6F690: vk_default_alloc (vk_alloc.c:26)
==134077==    by 0x52EEEBE: vk_alloc (vk_alloc.h:48)
==134077==    by 0x52EEEEE: vk_zalloc (vk_alloc.h:56)
==134077==    by 0x52EF47E: xe_exec_process_syncs (anv_batch_chain.c:132)
==134077==    by 0x52EF8F6: xe_execute_trtt_batch (anv_batch_chain.c:215)
==134077==    by 0x5301670: anv_queue_submit_trtt_batch (anv_batch_chain.c:1697)
==134077==    by 0x603D135: gfx125_write_trtt_entries (genX_cmd_buffer.c:6091)
==134077==    by 0x5370B44: anv_sparse_bind_trtt (anv_sparse.c:595)
==134077==    by 0x5370CFC: anv_sparse_bind (anv_sparse.c:629)
==134077==    by 0x5370E6E: anv_init_sparse_bindings (anv_sparse.c:670)
==134077==    by 0x5328037: anv_CreateBuffer (anv_device.c:5071)

Note to backporters: this is only for when xe.ko is being used and
ANV_SPARSE_USE_TRTT=1 is exported. This is not the regular code path.

Fixes: 18bd00c024 ("anv/trtt: don't wait/signal syncobjs using the CPU anymore")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28455>
2024-04-01 23:36:12 +00:00
Eric Engestrom
51c589234d isl: fix inline c identifier reference -> inline code
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28499>
2024-04-01 21:18:37 +00:00
Rohan Garg
3d68dd78d0 intel/eu/validate: Allow SIMD16 for mixed mode float operations on xe2+
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>
2024-04-01 00:00:03 +00:00
Rohan Garg
a368d234c8 intel/brw: Lower DWORD scattered read writes to lsc
Rework:
 * Francisco Jerez: Rebase on 07b9bfacc7 ("intel/compiler: Move
   logical-send lowering to a separate file")
 * Jordan: Move SHADER_OPCODE_DWORD_SCATTERED_*_LOGICAL from previous
   patch, as it seems to make more sense here.
 * Jordan: Change `devinfo->has_lsc` ?: to if/else as suggested by idr

Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>
2024-04-01 00:00:03 +00:00
Rohan Garg
b5040bfc3f intel/brw: Handle typed surface and atomic messages for xe2+
Reworks:
 * Francisco: Rebase on 07b9bfacc7 ("intel/compiler: Move
   logical-send lowering to a separate file")
 * Jordan: Rebase on 952a523abb ("intel: switch over to unified
   atomics")

Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>
2024-04-01 00:00:03 +00:00
Francisco Jerez
74efde7663 intel/brw/xehp+: Drop redundant arguments of lsc_msg_desc*().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>
2024-04-01 00:00:03 +00:00
Francisco Jerez
f1812437e8 intel/eu/xehp+: Don't initialize mlen and rlen descriptor fields from lsc_msg_desc*().
These fields are overlapping with the ones set by brw_message_desc(),
so the latter should be used instead.  This fixes corruption of the
LSC message descriptors when inconsistent values are specified through
both helpers, which can happen if the 'inst->mlen' field is modified
during optimization (e.g. by opt_split_sends()).

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>
2024-04-01 00:00:03 +00:00
Francisco Jerez
fa96274a87 intel/brw/xehp+: Replace lsc_msg_desc_dest_len()/lsc_msg_desc_src0_len() with helpers to do the computation.
We cannot rely on the immediate message descriptor having accurate
values for mlen and rlen at the IR level, since they are updated at
codegen time via 'inst->mlen' and 'inst->size_written', which could
end up with values inconsistent with the message descriptor if
e.g. the split sends optimization had an effect.  Instead, define
helpers that do the computation without relying on the message
descriptor, and use the pre-existing
brw_message_desc_mlen()/brw_message_desc_rlen() helpers (fully
equivalent to the lsc helpers deleted here) during disassembly.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>
2024-04-01 00:00:03 +00:00
Ian Romanick
5f9ab41457 intel/brw/xe2: Update uniform handling to account for 512b physical registers
Rework:
 * Jordan: Drop FINISHME (s-b Caio)
 * Jordan: Use reg_unit() in asserts rather than a ver check (s-b Caio)
 * Ian: Make use of reg_unit() in round_components_to_whole_registers()

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>
2024-04-01 00:00:03 +00:00
Ian Romanick
8587ef172c intel/brw/xe2: Update brw_nir_analyze_ubo_ranges to account for 512b physical registers
Rework:
 * Jordan: Use `REG_SIZE * reg_unit` (Suggested by Caio)

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>
2024-04-01 00:00:03 +00:00
Caio Oliveira
d9e737212d intel/brw: Add a src array for the common case in fs_inst
In the common case, fs_inst will have up to 4 sources (the HW
instructions have up to 3, and our representation of SENDs have 4).
Embed such array into the fs_inst, and use it whenever applicable
instead of allocating a new array.

Also change the code to reuse the allocated src array when resizing to
a smaller length.

Between the changes above and the reduced amount of initializing
fs_regs, this reduces fossil-db time by around 2% for Borderlands 3
and Rise of the Tomb Raider, and around 1.5% for Total War Warhammer 3.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28379>
2024-03-29 22:44:01 +00:00
Caio Oliveira
dae9795628 intel/brw: Remove vestiges of sources on IF opcode, only valid on Gfx6
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28379>
2024-03-29 22:44:01 +00:00
Kenneth Graunke
816a33849a intel/brw: Rearrange fs_inst fields
For better packing, and to make all the small fields easier to hash
and compare en masse.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28379>
2024-03-29 22:44:01 +00:00
Ian Romanick
5e9c01dfe4 intel/brw/xe2+: Use phys_nr and phys_subnr in DPAS encoding
Suggested-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28404>
2024-03-29 21:12:32 +00:00
Ian Romanick
6d85f7129a intel/brw/xe2+: DPAS must be SIMD16 now
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28404>
2024-03-29 21:12:32 +00:00
Ian Romanick
a8115221e5 nir: intel/brw: Change the order of sources for nir_dpas_intel
It was by pure luck that all sources (and the result) of nir_dpas_intel
had the same number of components. It is possible to support matrix
sizes where the accumlator matrix and the result matrix are larger
(e.g., 16x8 * 8x16 = 16x16).

This breaks all of the assumptions of NIR's infrastructure for code
generating intrinsics. Fix the by making the accumulator matrix be the
first source. The accumulator and the result will always have the same
dimensions (due to rules of matrix multiplication) and the same type
(due to restructions of the cooperative matrix extension). This forces
them to have the same number of components.

This doesn't fix all the potential problems. NIR expects that all
0-sized sources will have the same number of components. This just
ensures that the result has the correct number of components.

Fixes: 6b14da33ad ("intel/fs: nir: Add nir_intrinsic_dpas_intel")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28404>
2024-03-29 21:12:32 +00:00
Ian Romanick
c6bd6f2a41 intel/brw: Use enums for DPAS source regioning
Was previously passing 1, 1, 0 as the regioning. This generated
incorrect disassembly because the encoding for a width of 1 is 0. Use
the enums to ensure the correct values are used.

Fixes: 1c92dad5cb ("intel/disasm: Disassembly support for DPAS")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28404>
2024-03-29 21:12:32 +00:00
Ian Romanick
be4fa59a72 intel/brw: Clear write_accumulator flag when changing the destination
If the destination was the accumulator but is no longer, having the flag
set is not correct. On Xe2 this also causes a validation error.

v2: Reword the comment to be more clear. Suggested by Jordan.

Fixes: efa4e4bc5f ("intel/fs: Introduce regioning lowering pass.")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28404>
2024-03-29 21:12:32 +00:00
Rohan Garg
df3a1348d1 intel/brw: minor rework to de duplicate variable assignment
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27235>
2024-03-28 19:53:40 +00:00