Folks, there's more than one accumulator. In general, when the
register file is ARF, the upper 4 bits of the register number specify
which ARF, and the lower 4 bits specify which one of that ARF. This
can be further partitioned by the subregister number.
This is already mostly handled correctly for flags register, but lots
of places wanted to check the register number for equality with
BRW_ARF_ACCUMULATOR. If acc1 is ever specified, that won't work.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28281>
Make the filename more descriptive and since the file is used by
multiple drivers, move it into appropriate util/ directory.
Cosmetics:
- use SPDX license tag
- add newline before main function
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27804>
This will be useful to decode HW context in the next patch.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27888>
xe_topic can't be inside of the for loop otherwise it will be set to
TOPIC_INVALID at every iteration.
TOPIC_INVALID was added after it was reviewed by Lionel because CI
complained that xe_topic may be not initialized, turns out leaving it
not initialized was causing the xe_topic value to keep the value set
in the previous interation makeing the parser to work by luck.
Fixes: 90e38bbb3b ("intel/tools/error_decode: Parse Xe KMD error dump file")
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27888>
In what is probably the most common case cross of compilation, x86_64
-> x86, it should be possible to build intel-clc for the host machine
and run it. Doing so simplifies the build by not needing to be able to
cross compile half of mesa, and should ease developer and distro strain
for building Intel drivers for x86.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28222>
The vma_samplers vma heap is initialized unconditionally. Don't use
device->physical->indirect_descriptors as a condition on whether to
free it or not.
From my TGL machine:
==373617== 32 bytes in 1 blocks are definitely lost in loss record 1 of 1
==373617== at 0x48459F3: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==373617== by 0x6926DC0: util_vma_heap_free (vma.c:339)
==373617== by 0x6925ED3: util_vma_heap_init (vma.c:53)
==373617== by 0x5334EDA: anv_CreateDevice (anv_device.c:3404)
==373617== by 0x685593A: vk_tramp_CreateDevice (vk_dispatch_trampolines.c:78)
==373617== by 0x48A6D56: terminator_CreateDevice (loader.c:5833)
==373617== by 0x9C2293F: vulkan_layer_chassis::CreateDevice(VkPhysicalDevice_T*, VkDeviceCreateInfo const*, VkAllocationCallbacks const*, VkDevice_T**) (chassis.cpp:497)
==373617== by 0x48B0690: loader_create_device_chain (loader.c:4937)
==373617== by 0x48B1327: loader_layer_create_device (loader.c:4317)
==373617== by 0x48B8D79: vkCreateDevice (trampoline.c:1004)
==373617== by 0x10CC7A: MyApp::MyApp(int, bool) (sparse.cpp:608)
==373617== by 0x1201E8: main (sparse.cpp:6025)
Fixes: 7c76125db2 ("anv: use 2 different buffers for surfaces/samplers in descriptor sets")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28303>
This was missing, this is implemented in common code.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28275>
brw_fs_opt_eliminate_find_live_channel eliminates FIND_LIVE_CHANNEL
outside of control flow. None of our optimization passes generate
additional cases of that instruction, so once it's gone, we shouldn't
ever have to run the pass again. Moving it out of the loop should
save a bit of CPU time.
While we're at it, also clean adjacent BROADCAST instructions that
consume the result of our FIND_LIVE_CHANNEL. Without this, we have
to perform copy propagation to get the MOV 0 immediate into the
BROADCAST, then algebraic to turn it into a MOV, which enables more
copy propagation...not to mention CSE gets involved. Since this
FIND_LIVE_CHANNEL + BROADCAST pattern from emit_uniformize() is
really common, and it's trivial to clean up, we can do that. This
lets the initial copy prop in the loop see MOV instead of BROADCAST.
Zero impact on fossil-db, but less work in the optimization loop.
Together with the previous patches, this cuts compile time in
Borderlands 3 on Alchemist by -1.38539% +/- 0.1632% (n = 24).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28286>
It's a logical opcode which is lowered to a send-from-GRF later. That
lowering code is responsible for ensuring the sources are set up in a
proper SEND payload.
This was preventing copy propagation of surface handles which started
out as scalars, were splatted out to full-SIMD values with NoMask, then
actually consumed as only component 0 (scalar again), because we thought
that scalar values were not allowed.
fossil-db on Alchemist shows improvements in q2rtx but no other titles:
Totals:
Instrs: 161310436 -> 161310152 (-0.00%)
Cycles: 14370605159 -> 14370601066 (-0.00%)
Totals from 17 (0.00% of 652298) affected shaders:
Instrs: 16097 -> 15813 (-1.76%)
Cycles: 185508 -> 181415 (-2.21%)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28286>
We don't necessarily want to split up MOVs for 64-bit addresses into
2x 32-bit MOVs right away, as this makes things like copy propagating
the whole address around harder. We should do this late, once, while
still doing other algebraic optimizations earlier.
fossil-db results for Alchemist show tiny improvements:
Totals:
Instrs: 161310502 -> 161310436 (-0.00%); split: -0.00%, +0.00%
Cycles: 14370605606 -> 14370605159 (-0.00%); split: -0.00%, +0.00%
Totals from 33 (0.01% of 652298) affected shaders:
Instrs: 15053 -> 14987 (-0.44%); split: -0.64%, +0.20%
Cycles: 196947 -> 196500 (-0.23%); split: -0.25%, +0.02%
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28286>
Pci revision was included in the shader cache key because it can
enable platform workarounds. While some platform workarounds exist in
the compiler, none are dependent on the silicon stepping.
Many platforms differ only in the pci revision id, causing needless
duplication in cache entries between platforms.
When a platform ships publicly with stepping-specific compiler
workarounds, pci id must be incorporated into the shader cache key.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28085>
Emitting UNDEF is only necessary when the instructions we generate to
produce the NIR def are considered partial writes. By adding a simple
check (adapted from fs_inst::is_partial_write()), we can avoid creating
loads of unnecessary UNDEFs that we have to clean up later.
Our first dead code elimination pass does get rid of them pretty
quickly, but this should save memory and time during our first
split_virtual_grfs and dead_code_elimination passes.
This generates roughly 30% fewer instructions at the beginning.
Improves compilation time of shaders:
- Rise of the Tomb Raider: -3.51563% +/- 0.103951% (n=7)
- Borderlands 3: -3.64422% +/- 0.300951% (n=7).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28169>
Outside SIMD1 instructions, a destination stride of zero doesn't make
any sense. When such strides exist, they would be fixed by the FS
generator. Currently the only place that intentionally generates such a
stride is setup_barrier_message_payload_gfx125, and this commit changes
that.
The existence of a zero stride that won't really be a zero stride causes
a variety of problems with other optimization passes. Those passes don't
know that 0 actually means 1, and they make incorrect assumptions about
sizes written, etc.
The assertion helped catch many bugs in some other work in progress that
tries to store convergent values in SIMD8 registers regardless of the
dispatch width. That code would accidentally generate destination
strides of zero.
v2: Check stride differently depending on register file. Suggested by
Caio.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28256>
The stride given in the shader is in number of elements of the of the
type pointed by the given pointer, which may not match the matrix own
element type.
Since we cast the pointer to match the element type, the stride needs to
be ajusted accordingly.
v2:
- Fix mismatching bit-width in matrix element type and pointer type (Caio)
- Do the stride calculation in one place
Fixes dEQP-VK.compute.pipeline.cooperative_matrix.khr_*.multicomponent.*
Fixes: 3a35f8b29b ("intel/cmat: Lower cmat_load and cmat_store")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10820
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27903>
In 96e0d979a7, the restriction was dropped because we don't compile a
SIMD8 program on Xe2. This change moves it to run_fs() so the
restriction will be added when compiling SIMD16 on Xe2.
Fixes: 96e0d979a7 ("intel/fs: Check fs_visitor instance before using it")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28191>
For this particular case only it doesn't matter. Fixes some new CTS
tests with small inline uniform sizes.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28040>
Xe KMD needs VMs to be created to work.
Setting this on Xe KMD code path allow us to simply a feature check
in init_queue_families().
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28161>
Don't make sense to only set it in VkGetPhysicalDeviceQueueFamilyProperties2().
Not setting it to the code path without pdevice->engine_info because
the protected support landed on i915 after DRM_I915_QUERY_ENGINE_INFO.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28161>
Having just one place to check the Sparse type is less error prone.
For example in i915 it was always setting sparse_uses_trtt to true
even if running in gfx 9 that don't support sparse.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28161>