If we did clear a query buffer in compute mode, the flushing needs to
match the engine used for clearing.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 6823ffe70e ("anv: try to keep the pipeline in GPGPU mode when buffer transfer ops")
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28285>
The right one is a few lines below.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 44bf552704 ("anv: allocate border colors for descriptor buffers")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28387>
If entry goes until the last byte of the bo it was being marked as
not valid while it is valid.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28376>
Now that these can come from different releases, with different sets of
patches backported to them, it matters that we use the correct one.
Fixes: 78ea3bb43d ("ci/deqp: use the proper gl/gles releases for deqp-gl*, deqp-gles*, deqp-egl")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28343>
According to 00-mesa-defaults.conf, the only game that seems to care
about fp64_workaround_enabled right now is Doom Eternal. After some
brief testing I couldn't spot any performance difference by setting
shaderFloat64 to true.
We want to set this to true so that DIRT 5 can work, as it looks at
shaderFloat64 and then refuses to launch today.
Link: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9882
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28213>
This is achieved by the following steps:
#ifndef DEBUG => #if !MESA_DEBUG
defined(DEBUG) => MESA_DEBUG
#ifdef DEBUG => #if MESA_DEBUG
This is done by replace in vscode
excludes
docs,*.rs,addrlib,src/imgui,*.sh,src/intel/vulkan/grl/gpu
These are safe because those files should keep DEBUG macro is already excluded;
and not directly replace DEBUG, as we have some symbols around it.
Use debug or NDEBUG instead of DEBUG in comments when proper
This for reduce the usage of DEBUG,
so it's easier migrating to MESA_DEBUG
These are found when migrating DEBUG to MESA_DEBUG,
these are all comment update, so it's safe
Replace comment /* DEBUG */ and /* !DEBUG */ with proper /* MESA_DEBUG */ or /* !MESA_DEBUG */ manually
DEBUG || !NDEBUG -> MESA_DEBUG || !NDEBUG
!DEBUG && NDEBUG -> !(MESA_DEBUG || !NDEBUG)
Replace the DEBUG present in comment with proper new MESA_DEBUG manually
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: David Heidelberg <david.heidelberg@collabora.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28092>
The compiler only references `intel_device_info->subslice_masks` for
ray tracing workloads. Platforms which lack raytracing support can
share a cache even if they differ on this field.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28311>
The 64-bit type lowering for SEL in opt_algebraic had a pre-existing bug
where it only triggered when 64-bit float _and_ integer types were
unsupported. Meteorlake supports 64-bit float but not integer, so we
need to lower Q/UQ in that case still.
When I moved this to a later pass, opt_peephole_sel started generating
Q/UQ SEL instructions which were failing to be lowered.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10867
Fixes: ea423aba1b ("intel/brw: Split out 64-bit lowering from algebraic optimizations")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28328>
This is similar to what RADV implements using the NIR_LOOP_PASS
helpers. I have not used those helpers for a couple of reasons:
1. They use the pointer to the optimization function, which doesn't
work if the same function is called multiple times in one invocation
of the loop (fixable)
2. After fixing them, due to Intel's use of sub-expressions, the amount
of code added to wrap the shared macro becomes more than simply
reimplementing them for the Intel compiler
On most workloads the results are a wash, but on compile heavy
workloads like Cyberpunk 2077 and Rise of the Tomb Raider, I saw
fossil-db runtimes fall by 1-2% on my ICL, with no changes to the
compiled shaders. Caio saw closer to 2.5% on TGL.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27510>
The mlen tracking is in REG_SIZE units, but in Xe2 each GRF has
doubled the size. The optimization can only elide full GRFs, so
round down the amount of trailing zeros to ensure the optimization
will remove only full GRFs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28279>
Folks, there's more than one accumulator. In general, when the
register file is ARF, the upper 4 bits of the register number specify
which ARF, and the lower 4 bits specify which one of that ARF. This
can be further partitioned by the subregister number.
This is already mostly handled correctly for flags register, but lots
of places wanted to check the register number for equality with
BRW_ARF_ACCUMULATOR. If acc1 is ever specified, that won't work.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28281>
Make the filename more descriptive and since the file is used by
multiple drivers, move it into appropriate util/ directory.
Cosmetics:
- use SPDX license tag
- add newline before main function
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27804>
This will be useful to decode HW context in the next patch.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27888>
xe_topic can't be inside of the for loop otherwise it will be set to
TOPIC_INVALID at every iteration.
TOPIC_INVALID was added after it was reviewed by Lionel because CI
complained that xe_topic may be not initialized, turns out leaving it
not initialized was causing the xe_topic value to keep the value set
in the previous interation makeing the parser to work by luck.
Fixes: 90e38bbb3b ("intel/tools/error_decode: Parse Xe KMD error dump file")
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27888>
In what is probably the most common case cross of compilation, x86_64
-> x86, it should be possible to build intel-clc for the host machine
and run it. Doing so simplifies the build by not needing to be able to
cross compile half of mesa, and should ease developer and distro strain
for building Intel drivers for x86.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28222>
The vma_samplers vma heap is initialized unconditionally. Don't use
device->physical->indirect_descriptors as a condition on whether to
free it or not.
From my TGL machine:
==373617== 32 bytes in 1 blocks are definitely lost in loss record 1 of 1
==373617== at 0x48459F3: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==373617== by 0x6926DC0: util_vma_heap_free (vma.c:339)
==373617== by 0x6925ED3: util_vma_heap_init (vma.c:53)
==373617== by 0x5334EDA: anv_CreateDevice (anv_device.c:3404)
==373617== by 0x685593A: vk_tramp_CreateDevice (vk_dispatch_trampolines.c:78)
==373617== by 0x48A6D56: terminator_CreateDevice (loader.c:5833)
==373617== by 0x9C2293F: vulkan_layer_chassis::CreateDevice(VkPhysicalDevice_T*, VkDeviceCreateInfo const*, VkAllocationCallbacks const*, VkDevice_T**) (chassis.cpp:497)
==373617== by 0x48B0690: loader_create_device_chain (loader.c:4937)
==373617== by 0x48B1327: loader_layer_create_device (loader.c:4317)
==373617== by 0x48B8D79: vkCreateDevice (trampoline.c:1004)
==373617== by 0x10CC7A: MyApp::MyApp(int, bool) (sparse.cpp:608)
==373617== by 0x1201E8: main (sparse.cpp:6025)
Fixes: 7c76125db2 ("anv: use 2 different buffers for surfaces/samplers in descriptor sets")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28303>
This was missing, this is implemented in common code.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28275>
brw_fs_opt_eliminate_find_live_channel eliminates FIND_LIVE_CHANNEL
outside of control flow. None of our optimization passes generate
additional cases of that instruction, so once it's gone, we shouldn't
ever have to run the pass again. Moving it out of the loop should
save a bit of CPU time.
While we're at it, also clean adjacent BROADCAST instructions that
consume the result of our FIND_LIVE_CHANNEL. Without this, we have
to perform copy propagation to get the MOV 0 immediate into the
BROADCAST, then algebraic to turn it into a MOV, which enables more
copy propagation...not to mention CSE gets involved. Since this
FIND_LIVE_CHANNEL + BROADCAST pattern from emit_uniformize() is
really common, and it's trivial to clean up, we can do that. This
lets the initial copy prop in the loop see MOV instead of BROADCAST.
Zero impact on fossil-db, but less work in the optimization loop.
Together with the previous patches, this cuts compile time in
Borderlands 3 on Alchemist by -1.38539% +/- 0.1632% (n = 24).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28286>
It's a logical opcode which is lowered to a send-from-GRF later. That
lowering code is responsible for ensuring the sources are set up in a
proper SEND payload.
This was preventing copy propagation of surface handles which started
out as scalars, were splatted out to full-SIMD values with NoMask, then
actually consumed as only component 0 (scalar again), because we thought
that scalar values were not allowed.
fossil-db on Alchemist shows improvements in q2rtx but no other titles:
Totals:
Instrs: 161310436 -> 161310152 (-0.00%)
Cycles: 14370605159 -> 14370601066 (-0.00%)
Totals from 17 (0.00% of 652298) affected shaders:
Instrs: 16097 -> 15813 (-1.76%)
Cycles: 185508 -> 181415 (-2.21%)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28286>