Xe2 uses byte offsets rather than OWord offsets. We've been storing the
per-slot offsets in bytes on Xe2 for a while, but kept the global offset
immediate in OWords for some reason, choosing to lower it during logical
send lowering.
This patch makes both offsets (global immediate, per-slot) in the same
units, so they could be added together if necessary without scaling.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38343>
Much clearer, especially since we're dealing with at least four
different kinds of intrinsics. These helpers were introduced years ago,
but probably didn't exist when we first wrote this code.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38343>
I noticed that our backend was completely ignoring writemasks, despite
them appearing on many of the intrinsics we're implementing.
Rhys Perry pointed out that nir_lower_mem_access_bitsizes is removing
all non-trivial writemasking today, so ssbo/global/shared/scratch/etc.
stores should only ever see all components enabled. Which means what
we're doing is legitimate, if non-obvious. Add an assert to make it
obvious.
Thanks a lot to Rhys for helping me rediscover what made this work.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38343>
Use tristate for the aligned setting, otherwise it is always
first disabled which contributes to the condition if we set the
new stride active.
v2: set ByteStride in dword units and take secondary cmdbuf
in to account (Lionel)
Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Nataraj Deshpande <nataraj.deshpande@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38349>
Release builds are noisy about flush_type and scope being used
uninitialized, even though they are always set.
Initialize them to the final else values to make GCC happy.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38357>
In ray tracing dispatch, we have dispatch.threads set to 0 since we
calculate the local_size_x/y/z based on the launch sizes.
This change takes 0 threads into an account and returh the TG size 8 in
such scenarios. Before this change, we were setting TG size to 2.
Fixes: 0c4e1c9efc ("intel/common: Add helper for compute thread group dispatch size")
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38229>
Drivers already have to track this workaround, so remove the logic
from Blorp and let the driver manage this.
Also in Anv don't accumulate this workaround, emit it directly in
place right after COMPUTE_WALKER. Accumulating can be problematic when
you want to dispatch concurrent compute shaders that do not need any
cache flush interaction (typical example with the internal
simple_shader framework).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 3e0ad0176b ("anv: Emit state cache invalidation after every compute dispatch")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38306>
CmdWriteAccelerationStructuresPropertiesKHR writes the data with MI
commands, we no longer dispatch shaders to write the properties.
As a result, we don't need to flush untyped cache.
Fixes: f0e18c475b ("intel: remove GRL/intel-clc")
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38291>
Refactor the intermediate buffer copy path to use a generic callback
approach, making the code more maintainable and easier to extend with
new format conversions.
The core copy_intermediate() function is now format-agnostic, accepting
a conversion callback that handles the actual data transformation. This
moves format-specific logic (RGB<->RGBA conversion and ASTC
decompression) into dedicated callback functions, making the conversion
path explicit at each call site rather than hidden inside the copy
function.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37691>
Instead of allocating a buffer for the entire RGB->RGBA conversion
process. Just allocate a smaller buffer that is the size of a tile and
do the conversion one tile at a time.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37691>
The needs_temp_copy() function was incorrectly identifying some
depth/stencil formats as needing RGB<->RGBA conversion.
VK_FORMAT_D32_SFLOAT_S8_UINT maps to PIPE_FORMAT_Z32_FLOAT_S8X24_UINT,
which has 3 channels (F32 depth, UP8 stencil, X24 padding). The
component count check (== 3) was matching this as an RGB color format,
causing depth/stencil images to incorrectly use the RGB conversion path.
Add an explicit vk_format_is_depth_or_stencil() check before the
component count test to ensure depth/stencil formats always use the
direct copy path.
Fixes: f97b51186f ("anv: intermediate RGB <-> RGBX copy for HIC")
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37691>
Some shaders, especially RTPSO shaders that have parts of the PSO
inlined, can become absolutely huge. Using a sparse bitset avoids
quadratic complexity in memory consumption for the liveness information.
This reduces peak memory usage in worst-case tests (hammering
compilation of many huge RTPSOs on 32 threads concurrently) by ~60%,
from 43GB to 18GB.
CPU time (seconds) differences for a workload with mostly small shaders:
Difference at 95.0% confidence
-5.27 +/- 1.08963
-0.88811% +/- 0.183626%
(Student's t, pooled s = 0.629735)
Peak resident set usage for the mostly-small workload:
Difference at 95.0% confidence
30809 +/- 13394.3
1.59276% +/- 0.69246%
(Student's t, pooled s = 7741.09)
CPU time for the heavy workload did not show any difference.
Co-authored-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37908>
The code in question multiplies `uint32_t`s together and assigns them to
a `uint64_t`. It seems rather unlikely at there would be an overflow,
but we might as well do the cast.
CID: 1649587
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38289>
Our exposed limits say we shouldn't be able to, but let's add an assert
in case something changes, and to help Coverity out.
CID: 1662103
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37583>
When available, query and use explicit layout info otherwise fallback to
implicit layout with tiling query.
This fixes aligned layouts of multi-planar formats that were getting
misaligned when adding surfaces with implicit layouts.
Signed-off-by: Juston Li <justonli@google.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38039>
When EXT_descriptor_buffer is not enabled, we can assume we're in
legacy descriptor mode and not do any switching for secondaries.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Casey Bowman <casey.g.bowman@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38256>
We skip the stall emission for STATE_BASE_ADDRESS since this one can
be skipped on Gfx12.5+ and instead add a new sba tracepoint that has
valid timestamps.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 0147908a89 ("anv: predicate emission of STATE_BASE_ADDRESS")
Reviewed-by: Casey Bowman <casey.g.bowman@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38256>
Some new CTS tests have geometry shader looking like this :
void main()
{
gl_Position = gl_in[0].gl_Position;
EmitVertex();
EndPrimitive();
// <-- some storage buffer write
}
The generate shader has :
- a message to write the position
- a message to write to the storage buffer
- a final message to end the thread
This generates an empty EOT URB messages which is apparently not legal
(simulation complains, HW hangs) :
send(8) nullUD g126UD nullUD 0x04088007 0x00000000
urb MsgDesc: offset 0 SIMD8 write masked mlen 2 ex_mlen 0 rlen 0 { align1 1Q A@1 EOT };
Instead emit a write with actual data and the mask set at 0 to discard
the effect :
mov(8) g127<1>UD 0x00000000UD { align1 WE_all 1Q };
mov(8) g125<1>UD 0x00000000UD { align1 1Q };
send(8) nullUD g126UD g125UD 0x04088007 0x00000040
urb MsgDesc: offset 0 SIMD8 write masked mlen 2 ex_mlen 1 rlen 0 { align1 1Q A@1 EOT };
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38243>
pps initializes perf counter multiple times, once from
GpuDataSource::register_data_source and once from
GpuDataSource::OnSetup. This is fine, except we should replace
failing assert with skip on second call.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38224>
Note that currently autostrip is disabled globally with
Wa_14021490052 for some gfx versions and steppings.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37975>
These registers need to be whitelisted by kernel so that we can use
it to disable autostrip at will. This is about Wa_14024997852.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37975>
Coverity notices that in this case part of the decision to go down the
locked path invovles reading a flag, which is turn set inside the
protected code. Additional threads can decide they need to go down the
locked path, and then wait on the lock, even though the first thread
willse the device_registered to true, which would have otherwise
prevented them from going down the locked path.
What Coverity doesn't know, is that it is a violation of the Vulkan API
contract to call this function from two different threads, so in
practice that cann't happen. We'll move the setting of the
image_device_registered out of the locked area to see if that pacifies
Coverity, and if not then we'll just ignore it.
CID: 1662067
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37582>
No shader-db changes on any Intel platform.
fossil-db:
Skylake
Intel(R) HD Graphics 530 (SKL GT2)
Totals:
Cycle count: 57669758527 -> 57669757913 (-0.00%); split: -0.00%, +0.00%
Totals from 10 (0.00% of 1736875) affected shaders:
Cycle count: 274949 -> 274335 (-0.22%); split: -0.36%, +0.14%
This change is likely due to subtle differences of different registers
being allocated.
In addition, fossils/google-meet-clvk/BgBlur.1f58fdf742c27594.1.foz and
fossils/google-meet-clvk/Relight.1f58fdf742c27594.1.foz stopped failing
EU validation on Gfx9 platforms.
Closes: #14171
Fixes: e7b7d572b3 ("intel/fs/ra: Re-arrange interference setup")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38122>